Overview
Remote
Depends on Experience
Full Time
Skills
AWS
s3
Glue
PySpark
SQL
Airflow
Jupyter
Jupiter
ETL
Job Details
AWS Data Engineer
Location: Remote (travel to Houston TX once a month)
Duration: Fulltime Position
AWS Data Lake Engineer will design and work with the project team in developing the Data Lake and Data Integrations. This role will be a recognized as expert in AWS Data Lake and associated AWS tools/services. Our stack spans AWS, S3, Glue, EMR, EKS, Python, PySpark, SQL, Airflow, and Jupyter Notebooks.
Job Skills:
- 10+ years of experience working as Data Engineer
- 3 years of experience working as a Data Engineer designing and implementing scalable data solutions on AWS with a focus on ETL processes, data integration, and data warehousing.
- Strong proficiency in AWS services such as S3, EMR, EKS, Python, Pyspark, SQL, Airflow, and Jupyter Notebooks.
- Experience developing and maintaining ETL jobs using PySpark.
- Must have strong skills in SQL, Python, PySpark, and AWS
- Hands-on experience with ETL tools and techniques.
- Experience designing and interacting with relational and non-relational data stores.
- Experience with data modeling, data warehousing, and data lake architectures.
- Excellent analytical and problem-solving skills.
- Strong communication and collaboration skills, with the ability to work effectively in a team environment.
- AWS certification(s) is a plus.
Job Responsibilities:
- Work with Analysts and Business Users to translate functional specifications into data processes and models.
- Design and implement data pipelines using AWS services such as S3, Glue, EMR, EKS, Python, Pyspark, SQL, Airflow, and Jupyter Notebooks.
- Build and maintain data pipelines from a variety of data sources, including streaming datasets, APIs, and various data stores, leveraging PySpark and AWS Glue.
- Develop and maintain ETL processes to extract, transform, and load data from various sources into AWS data lakes and data warehouses.
- Collaborate with cross-functional teams to understand data requirements, identify data sources, and implement solutions for data ingestion and integration.
- Optimize and tune ETL workflows for performance, scalability, and reliability.
- Develop custom transformations and data pipelines using Spark and Python as needed.
- Ensure data integrity, quality, and security throughout the data lifecycle.
- appropriate error handling and monitoring mechanisms.
- Optimize data storage, processing, and retrieval for performance and cost-effectiveness.