Overview
Skills
Job Details
Job Title: Data Engineer (Spark | Java / Python / Scala)
Job Summary
We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and platforms using Apache Spark. The ideal candidate has strong programming experience in Java, Python, or Scala, and is passionate about processing large-scale data efficiently to support analytics, reporting, and machine learning initiatives.
Key Responsibilities
Design, develop, and maintain batch and streaming data pipelines using Apache Spark
Build scalable ETL/ELT solutions to process large volumes of structured and unstructured data
Write clean, efficient, and maintainable code in Java, Python, or Scala
Optimize Spark jobs for performance, scalability, and cost efficiency
Integrate data from multiple sources (databases, APIs, files, message queues)
Collaborate with data scientists, analysts, and software engineers to support data needs
Ensure data quality, reliability, and governance across pipelines
Monitor, troubleshoot, and resolve data pipeline issues in production
Work with cloud platforms (AWS, Azure, or Google Cloud Platform) and distributed systems
Required Skills & Qualifications
Strong experience with Apache Spark (Spark SQL, DataFrames, Spark Streaming)
Proficiency in at least one of the following: Java, Python, or Scala
Solid understanding of distributed systems and big data concepts
Experience with SQL and relational databases
Hands-on experience with data modeling and ETL/ELT processes
Familiarity with Hadoop ecosystem (HDFS, Hive, YARN)
Experience with version control systems (Git)
Preferred Qualifications
Experience with cloud-based data platforms (Databricks, EMR, BigQuery, Synapse)
Knowledge of Kafka, Flink, or other streaming technologies
Experience with NoSQL databases (Cassandra, MongoDB, HBase)
Familiarity with CI/CD pipelines and DevOps practices
Understanding of data security, governance, and compliance
Experience supporting machine learning or advanced analytics workloads
Nice to Have
Certifications in cloud or big data technologies
Experience with containerization tools (Docker, Kubernetes)
Knowledge of workflow orchestration tools (Airflow, Prefect, Oozi