Overview
Skills
Job Details
We re seeking a Lead Data Lakehouse Architect to design and lead the implementation of our next-generation, petabyte-scale data lakehouse built on Apache Iceberg. This role will set the foundation for our open-source data ecosystem, integrating Trino as the query layer and driving platform strategy for ingestion, orchestration, and governance.
Key Responsibilities
Architect and implement a modern Iceberg-based data lakehouse to support large-scale, real-time analytics and batch workloads.
Design high-performance query layers using Trino for federated and interactive querying.
Define architecture for metadata catalogs, storage tiers, partitioning strategies, schema evolution, and time travel.
Collaborate with data engineering, DevOps, and analytics teams to ensure seamless integration with streaming, ETL, and BI tools.
Establish platform standards, security models (e.g., Apache Ranger), data governance policies, and reliability SLAs.
Act as the technical authority on scaling Iceberg and Trino in production at petabyte scale.
Required Skills
7+ years in data platform or architecture roles, with 2+ years hands-on with Apache Iceberg at scale.
Strong expertise in Trino (or Presto), query optimization, and federated query architectures.
Deep knowledge of orchestration (Apache Airflow), streaming (Apache Kafka), and open-source governance tools.
Strong foundation in distributed systems, storage formats (Parquet, ORC), and schema management.
Experience deploying hybrid or on-prem cloud data platforms (AWS, Azure, Google Cloud Platform).
Excellent communication and leadership skills.