Senior Data Engineer
Introduction
We''re hiring a Senior Data Engineer to lead data platform delivery across our client engagements. This role involves architecting lakehouses, building production pipelines, modeling for analytics and AI workloads, and handing off platforms clients can actually operate.
This is a hands-on role with architecture leadership. You''ll spend most of your time in code — building Lakeflow pipelines, dbt models, and ingestion patterns — but you''ll also make the calls that shape the platform: Bronze/Silver/Gold structure, medallion vs. hub-and-spoke, catalog design, governance model.
You''ll walk into messy client environments — legacy core systems, undocumented schemas, competing definitions of "customer" across three departments — and figure out what to actually build. The tooling is the easy part; the hard part is the reasoning.
You''ll work directly with clients. You''ll scope, design, build, and present.
Responsibilities
- Own data platform delivery on concurrent client engagements
- Design lakehouse and warehouse architectures across Databricks, Snowflake, and Microsoft Fabric — and decide which platform fits which problem
- Build production pipelines: Lakeflow SDP (formerly DLT), Spark SQL, PySpark, dbt, Snowpark
- Design Unity Catalog and governance structures that hold up at enterprise scale
- Model for both analytics and AI workloads — dimensional models for BI, feature-ready data for retrieval and agents
- Reverse-engineer legacy sources (SAP, Salesforce, Oracle, proprietary core systems) with incomplete documentation and build ingestion that doesn''t break
- Make the buy-vs-build call on ingestion: Fivetran, Airbyte, Lakeflow Connect, custom
- Establish production rigor: data quality expectations, lineage, observability, cost controls, CI/CD
- Present architecture and tradeoffs directly to client technical leaders and executives
How you approach work
- You reason from first principles. When a client says "we need a lakehouse," you ask what decisions they''re trying to make before you pick a platform.
- You''re platform-pragmatic. Databricks, Snowflake, and Fabric all have a right answer somewhere — you know which is which and why.
- You default to the simplest thing that works. You know when to reach for a streaming pipeline and when a nightly batch is the right answer.
- You can hold the full system in your head — sources, ingestion, storage, transformation, semantic layer, consumption — and reason across all of them simultaneously.
- You think about cost. A technically elegant solution that burns $40K/month in compute is not a win.
Required Skills
- 5-8 years of data engineering experience with a focus on production platforms that serve real users
- Deep experience across all three major lakehouse/warehouse platforms: Databricks, Snowflake, and Microsoft Fabric. Not "I''ve touched them" — you''ve shipped production workloads on each and can defend platform selection decisions
- Strong fluency in SQL, Python, and PySpark — all three, not two out of three
- Production experience with Lakeflow SDP (or DLT), Unity Catalog, and medallion architecture patterns
- dbt at scale — not just models, but macros, testing strategy, and deployment patterns
- Ingestion architecture — you''ve built pipelines from Salesforce, SAP, and at least one legacy core system that didn''t want to be ingested
- Orchestration — Lakeflow Jobs, Airflow, Prefect, Dagster, or equivalent
- Cloud fluency — AWS, Azure, or Google Cloud Platform (we work across all three)
- Data modeling — dimensional, Data Vault, or OBT — you have opinions and can defend them
- Production mindset: data quality, lineage, observability, cost, CI/CD
- Client-facing presence. You can run a working session with a CDO, explain a tradeoff to a skeptical analyst, and write a clean architecture doc
Preferred Skills
- Experience preparing data estates for AI workloads — RAG-ready curation, semantic layers, feature stores
- Streaming experience — Kafka, Kinesis, Structured Streaming, or Fabric Real-Time Intelligence
- Power BI, Tableau, or Looker semantic layer design
- Prior consulting, agency, or multi-client experience
- Contributions to open source in the data tooling space (dbt, Spark, Airflow, Dagster)
- Databricks, Snowflake, or Microsoft certifications