Looking for a Python Developer (Face to face interview)

Overview

On Site

Depends on Experience

Accepts corp to corp applications

Contract - W2

Skills

Pyspark

SQL

AWS

Job Details

Role: Python Developer with Pyspark.

Location: Jersey City, NJ (Need Only locals)

Mail:

We are seeking an experienced Senior Python + PySpark Data Engineer to design, build, and operate large-scale data processing systems. You ll work closely with product, analytics, and ML teams to deliver reliable, performant ETL/ELT pipelines and transform raw data into high-quality analytics-ready datasets. If you enjoy solving complex data problems, tuning distributed systems, and applying engineering discipline to data delivery, this role is for you.

Key responsibilities

Design, implement, and maintain robust, production-grade ETL/ELT pipelines using PySpark and Python.
Author efficient Spark jobs (batch & streaming) and optimize for performance, memory, and cost.
Build and maintain data models, data schemas, and documentation to support reporting and ML use cases.
Collaborate with data scientists to productionize ML feature pipelines and enable reproducible training.
Integrate and orchestrate data workflows using Airflow, Prefect, or similar orchestrators.
Work with cloud data platforms (AWS/Google Cloud Platform/Azure) leverage services such as S3, Glue, EMR, Dataproc, BigQuery, or Synapse.
Implement data validation, monitoring, alerting, and SLA-driven operational practices.
Apply software engineering best practices: CI/CD for data pipelines, unit/integration tests, code reviews, and modular reusable libraries.
Troubleshoot production incidents and perform post-incident root-cause analysis.
Mentor junior engineers and contribute to team standards, architecture, and technical roadmaps.

Must-have qualifications

10+ years of professional experience in data engineering or related roles (adjustable per seniority).
Strong Python skills (3.7) idiomatic code, async fundamentals, packaging, and testing.
5+ years of hands-on experience with Apache Spark and PySpark (RDD/DataFrame/Dataset APIs).
Proven experience designing and optimizing Spark jobs for performance (partitioning, caching, shuffle avoidance, joins, broadcast, windowing).
Deep knowledge of SQL and experience with data warehousing concepts.
Experience with cloud data platforms (AWS preferred) and services such as S3, EMR, Glue, Redshift, or equivalent.
Experience with structured streaming or stream-processing platforms (Spark Structured Streaming, Kafka, Kinesis) is required.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share