AWS Data Engineer

Overview

Remote

Depends on Experience

Full Time

Skills

AWS

Glue

PySpark

SQL

Airflow

Jupyter

Jupiter

ETL

Job Details

AWS Data Engineer

Location: Remote (travel to Houston TX once a month)

Duration: Fulltime Position

AWS Data Lake Engineer will design and work with the project team in developing the Data Lake and Data Integrations. This role will be a recognized as expert in AWS Data Lake and associated AWS tools/services. Our stack spans AWS, S3, Glue, EMR, EKS, Python, PySpark, SQL, Airflow, and Jupyter Notebooks.

Job Skills:

10+ years of experience working as Data Engineer
3 years of experience working as a Data Engineer designing and implementing scalable data solutions on AWS with a focus on ETL processes, data integration, and data warehousing.
Strong proficiency in AWS services such as S3, EMR, EKS, Python, Pyspark, SQL, Airflow, and Jupyter Notebooks.
Experience developing and maintaining ETL jobs using PySpark.
Must have strong skills in SQL, Python, PySpark, and AWS
Hands-on experience with ETL tools and techniques.
Experience designing and interacting with relational and non-relational data stores.
Experience with data modeling, data warehousing, and data lake architectures.
Excellent analytical and problem-solving skills.
Strong communication and collaboration skills, with the ability to work effectively in a team environment.
AWS certification(s) is a plus.

Job Responsibilities:

Work with Analysts and Business Users to translate functional specifications into data processes and models.
Design and implement data pipelines using AWS services such as S3, Glue, EMR, EKS, Python, Pyspark, SQL, Airflow, and Jupyter Notebooks.
Build and maintain data pipelines from a variety of data sources, including streaming datasets, APIs, and various data stores, leveraging PySpark and AWS Glue.
Develop and maintain ETL processes to extract, transform, and load data from various sources into AWS data lakes and data warehouses.
Collaborate with cross-functional teams to understand data requirements, identify data sources, and implement solutions for data ingestion and integration.
Optimize and tune ETL workflows for performance, scalability, and reliability.
Develop custom transformations and data pipelines using Spark and Python as needed.
Ensure data integrity, quality, and security throughout the data lifecycle.
appropriate error handling and monitoring mechanisms.
Optimize data storage, processing, and retrieval for performance and cost-effectiveness.

Job Details

Share