Overview
Skills
Job Details
Position: Python Developer with Pyspark, AWS and Java
Location: Pasadena, CA (Only Locals-F2F Roles)
Job type: 1+ year contract
Exp Level: 10 Years.
Job Summary:
We are seeking a highly skilled Python Developer with strong PySpark expertise and hands-on experience in AWS cloud services to join our data engineering team. The ideal candidate will focus on designing and developing scalable data processing solutions using Apache Spark on the AWS platform. Java development is a secondary skill used for legacy system integration and occasional support.
Key Responsibilities:
Design, build, and optimize scalable ETL pipelines using PySpark
Work with large datasets to perform data transformation, cleansing, and aggregation
Develop and deploy data processing applications on AWS (e.g., EMR, S3, Lambda, Glue)
Develop reusable and efficient Python code following best practices
Collaborate with data engineers, data scientists, and product teams
Integrate data processing workflows with other services and systems, potentially using Java where needed
Monitor, troubleshoot, and improve performance of data jobs on distributed environments
Participate in code reviews and contribute to a culture of continuous improvement
Required Skills & Qualifications:
- 8+ years of professional experience in Python development
- 5+ years of hands-on experience with Apache Spark / Pyspark
- Strong understanding of distributed data processing
- Proven experience with AWS services, including: EMR, Glue, S3, Lambda, IAM, CloudWatch, etc.
- Working knowledge of Java development
- Experience with data structures, algorithms, and performance tuning in Spark
- Familiarity with SQL for querying large datasets
- Working knowledge of Java for backend or integration purposes
- Experience with version control systems like Git
- Familiarity with Agile/Scrum development methodology