AWS Data Engineer

Overview

Remote
Depends on Experience
Full Time

Skills

AWS
s3
Glue
PySpark
SQL
Airflow
Jupyter
Jupiter
ETL

Job Details

AWS Data Engineer

Location: Remote (travel to Houston TX once a month)

Duration: Fulltime Position

AWS Data Lake Engineer will design and work with the project team in developing the Data Lake and Data Integrations. This role will be a recognized as expert in AWS Data Lake and associated AWS tools/services. Our stack spans AWS, S3, Glue, EMR, EKS, Python, PySpark, SQL, Airflow, and Jupyter Notebooks.

Job Skills:

  • 10+ years of experience working as Data Engineer
  • 3 years of experience working as a Data Engineer designing and implementing scalable data solutions on AWS with a focus on ETL processes, data integration, and data warehousing.
  • Strong proficiency in AWS services such as S3, EMR, EKS, Python, Pyspark, SQL, Airflow, and Jupyter Notebooks.
  • Experience developing and maintaining ETL jobs using PySpark.
  • Must have strong skills in SQL, Python, PySpark, and AWS
  • Hands-on experience with ETL tools and techniques.
  • Experience designing and interacting with relational and non-relational data stores.
  • Experience with data modeling, data warehousing, and data lake architectures.
  • Excellent analytical and problem-solving skills.
  • Strong communication and collaboration skills, with the ability to work effectively in a team environment.
  • AWS certification(s) is a plus.

Job Responsibilities:

  • Work with Analysts and Business Users to translate functional specifications into data processes and models.
  • Design and implement data pipelines using AWS services such as S3, Glue, EMR, EKS, Python, Pyspark, SQL, Airflow, and Jupyter Notebooks.
  • Build and maintain data pipelines from a variety of data sources, including streaming datasets, APIs, and various data stores, leveraging PySpark and AWS Glue.
  • Develop and maintain ETL processes to extract, transform, and load data from various sources into AWS data lakes and data warehouses.
  • Collaborate with cross-functional teams to understand data requirements, identify data sources, and implement solutions for data ingestion and integration.
  • Optimize and tune ETL workflows for performance, scalability, and reliability.
  • Develop custom transformations and data pipelines using Spark and Python as needed.
  • Ensure data integrity, quality, and security throughout the data lifecycle.
  • appropriate error handling and monitoring mechanisms.
  • Optimize data storage, processing, and retrieval for performance and cost-effectiveness.