Data Engineer

hadoop, HDFS, Spark, Pyspark, Apache Airflow, Big data, Data engineering, Apache Hadoop, Apache HBase, Agile, Data visualization
Full Time
$80,000 - $120,000
Travel required to 10%.

Job Description

Candidates should possess strong knowledge and interest across big data technologies and have a background in data engineering.

  • Build data pipeline frameworks to automate high-volume and real-time data delivery for our Spark and streaming data hub
    • Transform complex analytical models in scalable, production-ready solutions
    • Provide support and enhancements for an advanced anomaly detection machine learning platform
    • Continuously integrate and ship code into our cloud production environments
    • Develop cloud based applications from the ground up using a modern technology stack
    • Work directly with Product Owners and customers to deliver data products in a collaborative and agile environment


  • At least 4 years of experience in the following Big Data frameworks: File Format (Parquet, AVRO, ORC), Resource Management, Distributed Processing and RDBMS
    • At least 4 years of developing applications with Monitoring, Build Tools, Version Control, Unit Test, TDD, Change Management to support DevOps
    • At least 2 years of experience with SQL and Shell Scripting experience
    • Experience of designing, building, and deploying production-level data pipelines using tools from Hadoop stack (HDFS, Hive, Spark, HBase, Kafka, NiFi, Oozie, Apache Beam, Apache Airflow etc).
    • Experience with Spark programming (pyspark or scala or java).
    • Experience troubleshooting JVM-related issues.
    • Experience and strategies to deal with mutable data in Hadoop.
    • Familiarity with Spark Structure Streaming and/or Kafka Streams.
    • Familiarity with machine learning implementation using PySpark.
    • Experience in data visualization tools like Cognos, Arcadia, Tableau.
    • Experience in Ab Initio technologies including, but not limited to Ab Initio graph development, EME, Co-Op, BRE, Continuous flow)


Dice Id : 10462843
Position Id : 7066251
Originally Posted : 3 months ago
Have a Job? Post it

Similar Positions

Big Data Developer & Lead
  • NexGen IOT Solutions, LLC
  • Irving, TX, USA
Big Data Engineer, Tech Lead
  • Syeta Inc
  • Irving, TX, USA
Pyspark Developer
  • NexGen IOT Solutions, LLC
  • Irving, TX, USA
Data Modeler
  • Triveni IT
  • Irving, TX, USA
Pyspark developer
  • Syeta Inc
  • Irving, TX, USA
Bigdata + Spark Developer
  • Virtusa Corporation
  • Irving, TX, USA
Data Engineer
  • Virtusa Corporation
  • Irving, TX, USA
Big Data Engineer
  • MNK Infotech, Inc.
  • Irving, TX, USA