Data Engineer

Overview

Remote
Depends on Experience
Full Time

Skills

Apache Spark
Java
Python
Scala

Job Details

Job Title: Data Engineer (Spark | Java / Python / Scala)

Job Summary

We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and platforms using Apache Spark. The ideal candidate has strong programming experience in Java, Python, or Scala, and is passionate about processing large-scale data efficiently to support analytics, reporting, and machine learning initiatives.


Key Responsibilities

  • Design, develop, and maintain batch and streaming data pipelines using Apache Spark

  • Build scalable ETL/ELT solutions to process large volumes of structured and unstructured data

  • Write clean, efficient, and maintainable code in Java, Python, or Scala

  • Optimize Spark jobs for performance, scalability, and cost efficiency

  • Integrate data from multiple sources (databases, APIs, files, message queues)

  • Collaborate with data scientists, analysts, and software engineers to support data needs

  • Ensure data quality, reliability, and governance across pipelines

  • Monitor, troubleshoot, and resolve data pipeline issues in production

  • Work with cloud platforms (AWS, Azure, or Google Cloud Platform) and distributed systems


Required Skills & Qualifications

  • Strong experience with Apache Spark (Spark SQL, DataFrames, Spark Streaming)

  • Proficiency in at least one of the following: Java, Python, or Scala

  • Solid understanding of distributed systems and big data concepts

  • Experience with SQL and relational databases

  • Hands-on experience with data modeling and ETL/ELT processes

  • Familiarity with Hadoop ecosystem (HDFS, Hive, YARN)

  • Experience with version control systems (Git)


Preferred Qualifications

  • Experience with cloud-based data platforms (Databricks, EMR, BigQuery, Synapse)

  • Knowledge of Kafka, Flink, or other streaming technologies

  • Experience with NoSQL databases (Cassandra, MongoDB, HBase)

  • Familiarity with CI/CD pipelines and DevOps practices

  • Understanding of data security, governance, and compliance

  • Experience supporting machine learning or advanced analytics workloads


Nice to Have

  • Certifications in cloud or big data technologies

  • Experience with containerization tools (Docker, Kubernetes)

  • Knowledge of workflow orchestration tools (Airflow, Prefect, Oozi

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.