PySpark Developer & Big Data Engineer

Overview

Hybrid

Depends on Experience

Contract - Independent

Contract - W2

Contract - 12 Month(s)

No Travel Required

Able to Provide Sponsorship

Skills

ci/cd pipelines

Data Engineering

Big Data Development

Python

PySpark

Spark Architecture

RDDs

DataFrames

Spark SQL

Hadoop

Hive

HDFS

Snowflake

SQL

Redshift

BigQuery

data lakes

cloud services

AWS EMR

databricks

Job Details

Job Title: PySpark Developer & Big Data Engineer

About the Role

We are seeking a skilled PySpark Developer & Big Data Engineer to join our data engineering team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing solutions using Apache Spark, Python, and related Big Data technologies. You’ll work closely with data architects, analysts, and business stakeholders to build robust data pipelines that support analytics, reporting, and machine learning workloads.

Key Responsibilities

Design, develop, and maintain ETL/ELT pipelines using PySpark and Spark SQL.
Integrate data from various structured and unstructured sources into data lakes or warehouses.
Optimize Spark jobs for performance and scalability across large datasets.
Collaborate with data scientists and analysts to prepare clean, reliable data for analytics and ML models.
Implement data validation, quality checks, CDC, SCD, error handling and logging mechanisms.
Work with cloud platforms (e.g., Azure, Google Cloud Platform, AWS) to deploy and manage data processing jobs using both real-time streaming and batch processing.
Implement serverless compute, virtual machines, job clusters, for big data processing and heavy loads
Evaluate in depth cost optimization options, review spark internals, using pros and cons of serverless compute and fixed cluster options
Participate in code reviews, testing, and performance tuning of data pipelines.
Document processes, workflows, and data transformations.

Required Skills & Qualifications

Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or related field.
3–7 years of experience in data engineering or Big Data development.
Strong proficiency in Python and PySpark.
Solid understanding of Spark architecture, RDDs, DataFrames, and Spark SQL.
Hands-on experience with distributed data systems (e.g., Hadoop, Hive, HDFS).
Experience working with data warehouses (e.g., Snowflake, Redshift, BigQuery) and data lakes.
Familiarity with workflow orchestration tools (e.g., Airflow, Oozie, Luigi).
Experience with cloud services such as AWS EMR, Databricks, or Azure Synapse.
Strong SQL skills and understanding of database concepts.
Excellent problem-solving, debugging, and communication skills.

Preferred Qualifications

Experience with CI/CD pipelines for data workflows.
Exposure to streaming data frameworks (e.g., Kafka, Spark Streaming).
Knowledge of containerization (Docker, Kubernetes).
Understanding of data governance, security, and compliance best practices.
FinOps and Performance Optimization

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

PySpark Developer & Big Data Engineer

Job Details

Share