Cloud Data Engineer (Apache Iceberg)

Westlake, TX, US • Posted 11 hours ago • Updated 11 hours ago

Contract W2

No Travel Required

On-site

$60 - $70/hr

TrueHire Staffing LLC

Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

Data Engineer
Data Governance
Data Modeling
Data Lake
ELT
Amazon S3
Apache Spark
Data Pipelines & Distributed Processing
Streaming Data Engineering (Kafka)
Quality
and Observability
AWS Architecture & Operations
Apache Iceberg

Summary

Title: Cloud Data Engineer (Apache Iceberg)

Location: Westlake, TX (Local only)

Duration: 12+ Months Contrac

Video Interview

MUST:

Need Genuine LinkedIn- NO RECENT OR MULTIPLE LINKEDIN

Must have full educational details with University Name and year of completion in resume.

Job Description

Apache Iceberg is the key skill here. Must have it. Must haves: Strong hands-on experience with Apache Iceberg (table design, evolution, metadata, partitioning). Deep experience with AWS data stack: S3, EMR, Lambda, Glue, IAM, Step Functions, CloudWatch Fluency in Python for data pipelines, automation, and APIs. Experience with distributed engines such as Spark, Flink, or PySpark. Expertise in scalable ETL/ELT pipelines and real-time streaming architectures. Strong SQL and data modeling expertise. Kafka can be a nice to have skill

Description:

Role Summary
We are seeking a highly skilled Data Engineer to design, build, and optimize our modern data platform leveraging Apache Iceberg on AWS, with strong expertise in Spark, Kafka and Python. The ideal candidate has deep experience building scalable, high?quality data pipelines, distributed data processing systems, and table-format–based lakehouse architectures.
This role is ideal for engineers who love building robust data foundations, enabling fast and reliable analytics, and working with cutting?edge open data lake technologies.
Key Responsibilities
1. Lakehouse Architecture (Apache Iceberg)

Design and build Iceberg-based data lakes with ACID-compliant, versioned datasets.
Implement Iceberg table evolution (schema evolution, partition spec, snapshot management).
Develop best practices for Iceberg governance, metadata compaction, and performance tuning.

2. Data Pipelines & Distributed Processing

Build scalable batch and streaming pipelines using AWS services (S3, EMR, Glue, Lambda, Step Functions).
Develop ingestion and transformation workflows using Python, Spark, or Flink.
Implement CDC pipelines using Kafka Connect or equivalent tooling.
Ensure robust CI/CD integration with GitHub Actions or similar.

3. Streaming Data Engineering (Kafka)

Design and operate Kafka-based streaming pipelines (Kafka/MSK).
Build producers/consumers using Python or JVM languages.
Implement patterns such as topic partitioning, compaction, schema registry, and event versioning.

4. Data Modeling, Quality, and Observability

Design data models for analytical and operational use cases using Iceberg tables.
Implement automated data quality checks, validation rules, and anomaly detection.
Build lineage, monitoring, alerting, and pipeline observability.

5. AWS Architecture & Operations

Apply best practices for AWS security, cost optimization, and data governance.
Manage IAM, KMS, S3 object lifecycle management, networking, and data encryption.
Operationalize EMR/Glue jobs, containerized workloads, or serverless workloads.

6. Cross?Functional Collaboration

Partner with analytics, platform, and product teams to deliver high-quality data products.
Participate in design reviews, architecture discussions, and roadmap planning.
Mentor junior engineers and contribute to engineering standards.

Required Qualifications

4–10+ years of experience in Data Engineering or similar roles.
Strong hands-on experience with Apache Iceberg (table design, evolution, metadata, partitioning).
Deep experience with AWS data stack:

S3, EMR, Lambda, Glue, IAM, Step Functions, CloudWatch

Strong proficiency in Kafka (producers/consumers, schema registry, partitioning strategies).
Fluency in Python for data pipelines, automation, and APIs.
Experience with distributed engines such as Spark, Flink, or PySpark.
Expertise in scalable ETL/ELT pipelines and real-time streaming architectures.
Strong SQL and data modeling expertise.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Dice Id: 91173234
Position Id: 8896430
Posted 11 hours ago

Company Info

About TrueHire Staffing LLC

Truehire is a leading provider of Recruitment Process Outsourcing (RPO) & Staffing services, supporting organizations across various industries and sizes. The company delivers customized hiring solutions designed to streamline recruitment operations and improve the quality of talent acquisition. At Truehire, the focus is on combining advanced technology with data-driven insights to achieve superior hiring outcomes. The team comprises experienced recruiters with strong domain expertise, enabling them to understand diverse client needs and deliver the right talent quickly and efficiently. One of Truehire core strengths is its ability to offer a seamless and consistent recruitment experience, regardless of client scale or geography. By working closely with clients, Truehire builds tailored recruitment strategies that align perfectly with their business goals and unique hiring requirements.

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs