Overview
Skills
Job Details
DATA ENGINEER with AWS Glue Python Developer
Location : Jersey City
1+Year
6+ years of experience in data engineering, with at least 3+ years working on AWS Glue.
Strong proficiency in Python with real-world PySpark development.
Hands-on experience with AWS services:
Glue, S3, Lambda, Athena, Redshift, Step Functions
Familiarity with data lake and lakehouse architectures.
Experience with schema evolution, partitioning, and data versioning.
Proficient in SQL and working with relational and NoSQL databases.
Experience with Git, CI/CD pipelines, and infrastructure-as-code tools (e.g., CloudFormation, Terraform) is a plus.
Strong analytical and problem-solving skills.
Design, develop, and maintain ETL pipelines using AWS Glue (PySpark).
Write efficient, scalable Python code for data extraction, transformation, and loading.
Work with structured and semi-structured data sources (CSV, Parquet, JSON, S3, RDS, Redshift).
Integrate AWS Glue jobs with event-based triggers, Step Functions, or Airflow workflows.
Optimize Spark-based transformations for performance and cost efficiency.
Build reusable ETL frameworks and metadata-driven pipelines.
Collaborate with data architects, analysts, and platform engineers to ensure end-to-end solution delivery.
Develop and manage AWS Glue crawlers, job bookmarks, and data catalogs.
Monitor, debug, and tune ETL pipelines for performance, reliability, and scalability.
Ensure data quality and implement data validation frameworks.
Document design and technical specifications, and contribute to code reviews and knowledge sharing.