Senior Data Engineer
Location: Reading, PA (Locals Only)
Please help with strong profiles of candidates from Tier-1/II companies with excellent communication skills.
1) Hands-on person who can work independently with minimal guidance as there is not formal architects involved in this project.
2) Python, Spark, Microservices, AWS - These are a must. Something equivalent can be given preference if no profiles are found with same skillset
3) Resource should be present local. 3 days WFO is mandatory
The Role
As a Senior Data Engineer, you will play a hands-on role in designing, building, and operating high-performance batch and streaming data platforms. You will:
Design, develop, and maintain large-scale batch and streaming pipelines using PySpark and Python.
Build real-time and near real-time streaming applications with stateful processing, windowing, and checkpointing.
Develop production-grade Python microservices for complex data transformations and business logic.
Design and manage modern data lake architectures using Apache Iceberg on AWS S3, implementing schema evolution, partitioning, compaction, and time travel.
Develop and deploy pipelines across AWS services including S3, EMR, Glue, Lambda, Athena, Redshift, and Aurora.
Optimize Spark workloads for performance, scalability, and cost efficiency.
Implement monitoring, logging, alerting, and recovery mechanisms for robust production operations.
Contribute to CI/CD pipelines, participate in architecture discussions, and uphold engineering best practices.
What You ll Bring
Bachelor s or Master s degree in Computer Science, Engineering, or a related discipline.
Over 10+ years of experience in IT and strong hands-on expertise in PySpark, Spark SQL, and distributed data processing.
Advanced proficiency in Python for building scalable, production-grade data solutions and microservices.
Proven experience building and running Kafka-based streaming applications in production environments.
Deep understanding of streaming fundamentals, including stateful processing and fault tolerance.
Hands-on experience with Apache Iceberg in production data lake environments.
Solid experience with AWS data services (S3, EMR, Glue, Lambda, Redshift, Aurora).
Advanced SQL skills and strong knowledge of data modeling and modern data lake architectures.
Strong troubleshooting skills in distributed data systems with a focus on reliability and performance.