Overview
HybridOnsite 3 Days
Depends on Experience
Contract - Independent
Contract - W2
Contract - 12 Month(s)
No Travel Required
Able to Provide Sponsorship
Skills
ci/cd pipelines
Data Engineering
Big Data Development
Python
PySpark
Spark Architecture
RDDs
DataFrames
Spark SQL
Hadoop
Hive
HDFS
Snowflake
SQL
Redshift
BigQuery
data lakes
cloud services
AWS EMR
databricks
Job Details
Job Title: PySpark Developer & Big Data Engineer
About the Role
We are seeking a skilled PySpark Developer & Big Data Engineer to join our data engineering team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing solutions using Apache Spark, Python, and related Big Data technologies. You’ll work closely with data architects, analysts, and business stakeholders to build robust data pipelines that support analytics, reporting, and machine learning workloads.
Key Responsibilities
- Design, develop, and maintain ETL/ELT pipelines using PySpark and Spark SQL.
- Integrate data from various structured and unstructured sources into data lakes or warehouses.
- Optimize Spark jobs for performance and scalability across large datasets.
- Collaborate with data scientists and analysts to prepare clean, reliable data for analytics and ML models.
- Implement data validation, quality checks, CDC, SCD, error handling and logging mechanisms.
- Work with cloud platforms (e.g., Azure, Google Cloud Platform, AWS) to deploy and manage data processing jobs using both real-time streaming and batch processing.
- Implement serverless compute, virtual machines, job clusters, for big data processing and heavy loads
- Evaluate in depth cost optimization options, review spark internals, using pros and cons of serverless compute and fixed cluster options
- Participate in code reviews, testing, and performance tuning of data pipelines.
- Document processes, workflows, and data transformations.
Required Skills & Qualifications
- Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or related field.
- 3–7 years of experience in data engineering or Big Data development.
- Strong proficiency in Python and PySpark.
- Solid understanding of Spark architecture, RDDs, DataFrames, and Spark SQL.
- Hands-on experience with distributed data systems (e.g., Hadoop, Hive, HDFS).
- Experience working with data warehouses (e.g., Snowflake, Redshift, BigQuery) and data lakes.
- Familiarity with workflow orchestration tools (e.g., Airflow, Oozie, Luigi).
- Experience with cloud services such as AWS EMR, Databricks, or Azure Synapse.
- Strong SQL skills and understanding of database concepts.
- Excellent problem-solving, debugging, and communication skills.
Preferred Qualifications
- Experience with CI/CD pipelines for data workflows.
- Exposure to streaming data frameworks (e.g., Kafka, Spark Streaming).
- Knowledge of containerization (Docker, Kubernetes).
- Understanding of data governance, security, and compliance best practices.
- FinOps and Performance Optimization
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.