Sr. Data Engineer – AWS Glue & Snowflake, Healthcare Payer
Location – Midtown Manhattan, NYC- 4day Onsite
Position Summary
We are seeking a highly skilled Senior Data Engineer with deep expertise in AWS Glue and cloud-based data engineering to design, build, and optimize scalable data integration solutions. This role will be responsible for developing enterprise-grade ETL/ELT pipelines, managing large-scale data movement, and supporting modern analytics platforms built on AWS and Snowflake.
The ideal candidate combines strong software engineering practices with hands-on experience in AWS data services, PySpark development, and cloud-native architecture.
Key Responsibilities
Data Pipeline Development
- Design, develop, and maintain scalable ETL/ELT pipelines using AWS Glue.
- Build and optimize Glue Jobs, Crawlers, Data Catalogs, and Glue Workflows.
- Develop PySpark and Python-based transformation logic for large-scale data processing.
- Create reusable data ingestion frameworks and automation capabilities.
- Implement robust error handling, monitoring, and recovery mechanisms.
AWS Data Platform Engineering
- Design and maintain data solutions leveraging AWS services including:
- AWS Glue
- Amazon S3
- IAM
- Lake Formation
- CloudWatch
- Lambda
- Step Functions
- EventBridge
- Ensure secure, scalable, and cost-effective data platform operations.
- Optimize AWS resource utilization and processing performance.
Snowflake Integration
- Build and maintain data pipelines that ingest, transform, and load data into Snowflake.
- Optimize Snowflake loading patterns, staging strategies, and transformation workflows.
- Collaborate with data modelers and analytics teams to support reporting and business intelligence initiatives.
Data Quality & Governance
- Implement data quality controls, validation routines, and monitoring processes.
- Support data lineage, metadata management, and governance requirements.
- Troubleshoot data anomalies and performance bottlenecks across the platform.
Engineering & DevOps
- Develop solutions using Infrastructure as Code and CI/CD best practices.
- Utilize Git-based source control and automated deployment pipelines.
- Participate in architecture reviews and technical design discussions.
- Mentor junior engineers and promote engineering standards.
Required Qualifications
- Bachelor''s degree in Computer Science, Information Systems, Engineering, or related field.
- 5+ years of experience in Data Engineering or ETL development.
- 3+ years of hands-on AWS Glue development experience.
- Strong Python and PySpark programming skills.
- Advanced SQL development skills.
- Experience building cloud-native data pipelines on AWS.
- Experience integrating with Snowflake or similar cloud data warehouses.
- Knowledge of data warehousing concepts and dimensional modeling.
- Experience working with large datasets and distributed processing frameworks.
Preferred Qualifications
- AWS Certified Data Engineer – Associate or Specialty certification.
- Experience with Apache Spark optimization and tuning.
- Experience with dbt.
- Experience with Airflow, Step Functions, or other orchestration tools.
- Experience with Kafka, Kinesis, or streaming data platforms.
- Experience with healthcare, insurance, financial services, or other regulated industries.
- Familiarity with Kimball dimensional modeling methodologies.
Technical Skills
Required
- AWS Glue
- PySpark
- Python
- SQL
- Amazon S3
- Snowflake
- Data Warehousing
- ETL/ELT Development
- Git
Preferred
- dbt
- Airflow
- Step Functions
- Lambda
- Lake Formation
- Terraform
- CloudFormation
- Kafka/Kinesis
- Docker
Success Factors
- Strong software engineering mindset.
- Expertise in building reliable and scalable data pipelines.
- Ability to troubleshoot complex data integration issues.
- Strong communication and collaboration skills.
- Focus on automation, performance optimization, and operational excellence.
Ideal Candidate Profile
An engineer who can independently design a data ingestion framework, write PySpark transformations in AWS Glue, optimize Snowflake loads, automate deployments through CI/CD, and serve as the technical expert for enterprise data integration initiatives.