Lead Data Lakehouse Engineer (Iceberg)

Remote • Posted 60+ days ago • Updated 8 days ago
Full Time
Occasional Travel Required
Remote
Depends on Experience
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Amazon Web Services
  • Apache Airflow
  • Apache HTTP Server
  • Apache Kafka
  • Apache Parquet
  • Apache Ranger
  • Business Intelligence
  • Cloud Computing
  • Data Engineering
  • Data Governance
  • DevOps
  • Extract
  • Transform
  • Load
  • Google Cloud Platform
  • Meta-data Management
  • Open Source
  • Orchestration
  • Query Optimization
  • Real-time
  • Storage
  • Streaming

Summary

We re seeking a Lead Data Lakehouse Architect to design and lead the implementation of our next-generation, petabyte-scale data lakehouse built on Apache Iceberg. This role will set the foundation for our open-source data ecosystem, integrating Trino as the query layer and driving platform strategy for ingestion, orchestration, and governance.

Key Responsibilities

  • Architect and implement a modern Iceberg-based data lakehouse to support large-scale, real-time analytics and batch workloads.

  • Design high-performance query layers using Trino for federated and interactive querying.

  • Define architecture for metadata catalogs, storage tiers, partitioning strategies, schema evolution, and time travel.

  • Collaborate with data engineering, DevOps, and analytics teams to ensure seamless integration with streaming, ETL, and BI tools.

  • Establish platform standards, security models (e.g., Apache Ranger), data governance policies, and reliability SLAs.

  • Act as the technical authority on scaling Iceberg and Trino in production at petabyte scale.

Required Skills

  • 7+ years in data platform or architecture roles, with 2+ years hands-on with Apache Iceberg at scale.

  • Strong expertise in Trino (or Presto), query optimization, and federated query architectures.

  • Deep knowledge of orchestration (Apache Airflow), streaming (Apache Kafka), and open-source governance tools.

  • Strong foundation in distributed systems, storage formats (Parquet, ORC), and schema management.

  • Experience deploying hybrid or on-prem cloud data platforms (AWS, Azure, Google Cloud Platform).

  • Excellent communication and leadership skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10120234
  • Position Id: 8786525
  • Posted 30+ days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote or Santa Ana, California

Today

Full-time

USD 129,300.00 - 172,300.00 per year

Remote

Today

Full-time

Remote

Today

Full-time

USD 90,000.00 - 120,000.00 per year

Remote or San Francisco, California

Today

Full-time

USD 123,696.00 - 254,667.00 per year

Search all similar jobs