Overview
Skills
Job Details
Job Title: Data Engineer (Python / Spark / AWS)
Location: Richmond, VA/ McLean, VA/ Dallas, Texas-Onsite (LOCALS only)
Client: Confidential
Multiple Positions
Rate: $60 C2C
Interview Process:
Round 1 & 2: Remote/Virtual
Final Round: In-Person (F2F) at VA or TX
Role Summary:
We are seeking talented and experienced Data Engineers with strong expertise in Python, Spark (PySpark), and AWS to work on enterprise-scale data modernization initiatives. The selected candidates will be responsible for building, optimizing, and maintaining robust data pipelines that support analytics, machine learning, and reporting platforms.
This role offers a great opportunity to work with cutting-edge cloud technologies and deliver high-impact data engineering solutions in a dynamic enterprise environment.
Key Responsibilities:
Design, build, and maintain ETL / ELT data pipelines and data ingestion workflows using Python and Spark (PySpark).
Develop and manage data processing solutions leveraging AWS cloud services such as S3, Glue, EMR, Redshift, Lambda, and Athena.
Create and optimize data models, schemas, and partitioning strategies for data lakes and warehouses.
Improve pipeline performance and scalability through Spark optimization and resource tuning.
Collaborate closely with data science, analytics, and application teams to deliver clean, reliable, and accessible data.
Implement data quality validation, logging, and observability for production systems.
Ensure compliance with data governance, lineage, and security standards.
Participate in code reviews, documentation, and continuous improvement initiatives.
Required Skills:
7+ years of hands-on experience as a Data Engineer.
Strong programming experience in Python (including Pandas and PySpark).
Proficient in Apache Spark / PySpark for data transformation and large-scale processing.
Solid experience with AWS data services: S3, Glue, EMR, Lambda, Redshift, and Athena.
Strong SQL skills and understanding of data modeling and schema design.
Experience with workflow orchestration tools (Airflow, Step Functions, or similar).
Hands-on experience with ETL optimization, pipeline monitoring, and performance tuning.
Understanding of data governance, lineage, and best practices for enterprise data systems.
Excellent communication and analytical skills, with ability to work cross-functionally.
Preferred Skills:
Familiarity with streaming frameworks (Kafka, Kinesis, or Flink).
Experience with Snowflake, BigQuery, or Redshift Spectrum.
Knowledge of infrastructure automation (Terraform, CloudFormation).
Exposure to machine learning pipelines or feature store integration.
Experience working in financial or regulated industries preferred.
If interested, please share your details to: