Role: AWS Data engineer
Location: Richardson, TX (hybrid )
Direct client
Description
Design and develop data architecture: Create scalable, reliable, and efficient data lakehouse solutions on AWS, leveraging Apache Iceberg and other AWS services for table formats. Build and maintain data pipelines: Design, construct, and automate ETL/ELT processes to ingest data from diverse sources into the AWS ecosystem. Create and manage data APIs: Design, develop, and maintain secure and scalable RESTful and other APIs to facilitate data access for internal teams and applications, typically leveraging AWS services. Implement AWS services: Utilize a wide array of AWS tools for data processing, storage, and analytics, such as Amazon S3, Amazon EMR, and AWS Lake Formation, with native Iceberg support. Manage Iceberg tables: Build and manage Apache Iceberg tables on Amazon S3 to enable data lakehouse features like ACID transactions, time travel, and schema evolution. Optimize data performance: Implement partitioning strategies, data compaction, and fine-tuning techniques for Iceberg tables to enhance query performance. Ensure data quality and integrity: Implement data validation and error-handling processes, leveraging Iceberg\'s transactional capabilities for consistent data. Ensure security and compliance:Implement robust data security measures, access controls, and compliance with data protection regulations, including using AWS Lake Formation with Iceberg and implementing authorization on APIs via IAM or Cognito.
Collaborate with stakeholders:Work closely with data scientists, analysts, software engineers, and business teams to understand their data needs and deliver effective solutions. Provide technical support:Offer technical expertise and troubleshooting for data-related issues related to pipelines and API endpoints.
Maintain documentation:Create and maintain technical documentation for data workflows, processes, and API specifications.
Additional Skills & Qualifications
Programming:Proficiency in programming languages like Python, Java, or Scala. SQL:Strong SQL skills for querying, data modeling, and database design. AWS Services:Expertise in relevant AWS services such as S3, EMR, Lambda, API Gateway, SageMaker and IAM. Apache Iceberg:Hands-on experience building and managing Apache Iceberg tables. Big Data:Experience with big data technologies like Apache Spark and Hadoop. API Development:Experience creating and deploying RESTful APIs, with knowledge of best practices for performance and security. ETL/Workflow:Experience with ETL tools and workflow orchestration tools like Apache Airflow.