Data Engineer with Scala and Spark - SFO, CA (InPerson Interview - Hybrid)

Overview

On Site
$55 - $60
Contract - W2
Contract - Independent

Skills

Data Engineer (Scala / PySpark)

Job Details

Role: Data Engineer (Scala / PySpark)

Location: San Francisco, CA (Hybrid & In-person)

Employment Type: Contract

About the Role

We are seeking an experienced Data Engineer with strong expertise in Scala or PySpark to join our team in San Francisco. The ideal candidate will have a proven track record of building scalable data pipelines, processing large datasets, and working with modern big data and cloud technologies. You will play a key role in designing, implementing, and optimizing data solutions that support analytics, machine learning, and business intelligence initiatives.

Key Responsibilities

  • Design, develop, and optimize data pipelines using Scala or PySpark to process structured and unstructured datasets.
  • Work with large-scale data platforms such as Apache Spark, Hadoop, Databricks, and Kafka for real-time and batch processing.
  • Develop and maintain ETL workflows for ingesting, cleaning, and transforming data from multiple sources (databases, APIs, streaming).
  • Implement best practices for data modelling, partitioning, and schema evolution in big data environments.
  • Collaborate with data scientists, analysts, and business stakeholders to deliver analytics-ready datasets.
  • Ensure data quality, lineage, and governance are maintained across the data ecosystem.
  • Work with cloud platforms (AWS, Azure, Google Cloud Platform) to deploy, scale, and monitor data pipelines.
  • Contribute to performance tuning and cost optimization of data jobs in Spark and cloud environments.
  • Participate in code reviews, design discussions, and Agile ceremonies.
  • Support production data pipelines, monitor performance, and troubleshoot issues.

Required Skills & Qualifications

  • 5+ years of experience as a Data Engineer or Big Data Developer.
  • Strong programming skills in Scala and/or Python (PySpark).
  • Hands-on experience with Apache Spark (batch and streaming) in enterprise-scale data platforms.
  • Experience with Hadoop ecosystem (Hive, HDFS, YARN, Oozie) and/or Databricks.
  • Proficiency in SQL for data querying, analysis, and performance tuning.
  • Experience with data modelling and ETL/ELT processes in large-scale systems.
  • Familiarity with streaming technologies (Kafka, Kinesis, Flink) is a plus.
  • Strong understanding of cloud services (AWS EMR/Glue, Azure Synapse/ADF, Google Cloud Platform Dataproc/BigQuery).
  • Knowledge of data governance, lineage, and compliance best practices.
  • Excellent problem-solving skills and ability to work in fast-paced, collaborative environments.

Preferred Qualifications

  • Experience with Delta Lake, Iceberg, or Hudi for data lakehouse implementations.
  • Knowledge of CI/CD pipelines, Git, and DevOps practices for data engineering.
  • Familiarity with containerization (Docker, Kubernetes).
  • Exposure to ML pipelines and feature engineering with Spark MLlib.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Floga technologies