Data Engineer

Overview

Remote

Depends on Experience

Full Time

Skills

Apache Spark

Java

Python

Scala

Job Details

Job Title: Data Engineer (Spark | Java / Python / Scala)

Job Summary

We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and platforms using Apache Spark. The ideal candidate has strong programming experience in Java, Python, or Scala, and is passionate about processing large-scale data efficiently to support analytics, reporting, and machine learning initiatives.

Key Responsibilities

Design, develop, and maintain batch and streaming data pipelines using Apache Spark
Build scalable ETL/ELT solutions to process large volumes of structured and unstructured data
Write clean, efficient, and maintainable code in Java, Python, or Scala
Optimize Spark jobs for performance, scalability, and cost efficiency
Integrate data from multiple sources (databases, APIs, files, message queues)
Collaborate with data scientists, analysts, and software engineers to support data needs
Ensure data quality, reliability, and governance across pipelines
Monitor, troubleshoot, and resolve data pipeline issues in production
Work with cloud platforms (AWS, Azure, or Google Cloud Platform) and distributed systems

Required Skills & Qualifications

Strong experience with Apache Spark (Spark SQL, DataFrames, Spark Streaming)
Proficiency in at least one of the following: Java, Python, or Scala
Solid understanding of distributed systems and big data concepts
Experience with SQL and relational databases
Hands-on experience with data modeling and ETL/ELT processes
Familiarity with Hadoop ecosystem (HDFS, Hive, YARN)
Experience with version control systems (Git)

Preferred Qualifications

Experience with cloud-based data platforms (Databricks, EMR, BigQuery, Synapse)
Knowledge of Kafka, Flink, or other streaming technologies
Experience with NoSQL databases (Cassandra, MongoDB, HBase)
Familiarity with CI/CD pipelines and DevOps practices
Understanding of data security, governance, and compliance
Experience supporting machine learning or advanced analytics workloads

Nice to Have

Certifications in cloud or big data technologies
Experience with containerization tools (Docker, Kubernetes)
Knowledge of workflow orchestration tools (Airflow, Prefect, Oozi

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.