Overview
Skills
Job Details
Title - Data engineers with Python, Pyspark, Snowflake, EMR, EKS
Location-Onsite Mclean, VA - Need Locals Only
Duration Contract
Must Have skills
Python, Pyspark, Snowflake, EMR, EKS
Job Summary
We are seeking an experienced Data Engineer to design, build, and optimize scalable data pipelines and solutions across cloud and big data ecosystems.
The ideal candidate will have strong expertise in Python, PySpark, Snowflake, AWS EMR, and Amazon EKS, with the ability to transform raw data into actionable insights while ensuring performance, reliability, and security.
Key Responsibilities
Design, build, and maintain scalable and efficient ETL/ELT pipelines using Python and PySpark.
Develop and optimize data workflows across Snowflake and AWS-based big data platforms.
Work with AWS EMR clusters to manage distributed data processing and analytics workloads.
Deploy and manage containerized data applications using Amazon EKS (Kubernetes).
Collaborate with data scientists, analysts, and business stakeholders to enable reliable data delivery for analytics and reporting.
Ensure data quality, governance, and security standards are maintained across pipelines.
Optimize pipeline performance for scalability, cost efficiency, and resilience.
Contribute to the design and architecture of cloud-native data engineering solutions.
Support CI/CD pipelines and infrastructure-as-code practices for data platform deployments.
Required Skills & Qualifications
10 years of experience as a Data Engineer or similar role.
Strong programming skills in Python with experience in PySpark for large-scale data processing.
Expertise in Snowflake (warehousing, performance tuning, query optimization, data modeling).
Hands-on experience with AWS EMR for distributed data processing.
Experience with Amazon EKS (Kubernetes) for containerized workloads.
Solid understanding of SQL and relational data modeling.
Familiarity with CI/CD, Git, and DevOps practices.
Strong problem-solving, communication, and collaboration skills.
Preferred Qualifications
Experience in data lake architectures and streaming technologies (e.g., Kafka, Kinesis).
Knowledge of infrastructure as code (Terraform, CloudFormation).
Exposure to Airflow, dbt, or other orchestration tools.
Background in financial services, healthcare, or large-scale enterprise data environments is a plus.