Position :: AWS Data Engineer
Location :: 100% Remote
Duration :: 12-18+ months
Interview :: Video
Job Description:
Must Have
· PySpark ramp-up
· Glue job hands-on proof
· Dimensional modeling
Senior AWS Data Engineer
Core Responsibilities
· Develop and maintain PySpark-based ETL pipelines for batch and incremental data processing
- Build and operate AWS Glue Spark jobs(batch and event-driven), including:
o Job configuration, scaling, retries, and cost optimization
- Glue Catalog and schema management
· Design and maintain event-driven data workflows triggered by S3, EventBridge, or streaming sources
- Load and transform data into Amazon Redshift, optimizing for:
o Distribution and sort keys
- Incremental loads and upserts
- Query performance and concurrency
· Design and implement dimensional data models (star/snowflake schemas), including:
o Fact and dimension tables
- Slowly Changing Dimensions (SCDs)
- Grain definition and data quality controls
· Collaborate with analytics and reporting teams to ensure the warehouse is BI-ready
- Monitor, troubleshoot, and optimize data pipelines for reliability and performance
Required Technical Experience
· Strong PySpark experience (Spark SQL, DataFrames, performance tuning)
- Hands-on experience with AWS Glue (Spark jobs, not just crawlers)
- Experience loading and optimizing data in Amazon Redshift
- Proven experience designing dimensional data warehouse schemas
- Familiarity with AWS-native data services (S3, IAM, CloudWatch)
- Production ownership mindset (debugging, failures, reprocessing)