ABOUT THE ROLE:
We are looking for a proactive, ownership-driven Data Engineer to take full responsibility for the creation, health, performance, and reliability of our AWS, dbt, and Snowflake-based data pipeline and warehouse. This pipeline powers both client-facing and internal dashboards, making data freshness and accuracy mission critical. You will own the end-to-end pipeline - from CDC ingestion through dimensional modeling of customer facing metrics and their dashboard-ready delivery - and be the first line of defense when something breaks or degrades.
The ideal candidate will proactively monitor system health, anticipate issues before they surface in dashboards, drive continuous improvements in pipeline reliability, and communicate clearly across technical and non-technical stakeholders. Strong data modeling will be required for the creation of the analytics ERD and customer-facing dashboard metrics.
KEY RESPONSIBILITIES:
Pipeline Health & Reliability
- Own the stability and availability of the Snowflake datamart and all upstream ingestion processes - Dual ingestion streams: Postgres > CDC (Openflow) and event logs > Firehose, both landing into Snowflake.
- Monitor dynamic table refresh cycles, lag metrics, and failure states; resolve issues proactively
- Implement and maintain alerting for pipeline delays, data quality anomalies, and ingestion failures
- Establish, monitor/alert and enforce SLAs for data pipeline health including freshness in alignment with downstream consumer expectations.
- Conduct regular pipeline and cost health reviews and document findings and remediation actions
- Ensure strong data governance including PII handling, RBAC, naming conventions, schema change testing, etc.
Data Modeling & Architecture
- Design and build Snowflake Dynamic Tables to support near-real-time datamart refresh
- Build and maintain dimensional models (dims and facts) sourced from CDC streams (e.g., Openflow or similar).
- Author and maintain dbt models, tests, and documentation across staging, intermediate, and mart layers.
- Apply SCD Type 2 patterns where history tracking is required for dimension tables.
- Optimize query performance via clustering, materialized views, and warehouse sizing strategies
CDC & Ingestion
- Manage CDC pipelines feeding the Snowflake datamart, including stream configuration and change propagation
- Ensure schema evolution is handled gracefully downstream without disrupting datamart consumers
- Collaborate with source system owners to understand upstream data changes, assess impact, and adapt metric logic
Quality, Testing & Observability
- Implement dbt tests as a first-class part of the build process
- Perform root-cause analysis on data quality issues and implement durable fixes -not workarounds
- Build and maintain data observability dashboards to give internal teams visibility into pipeline state
- Support QA processes for new models and upstream data source changes
Stakeholder & Cross-Functional Collaboration
- Partner with the product engineering team to define datamart interface formats
- Communicate pipeline incidents, root causes, and resolution timelines clearly to non-technical stakeholders
- Document data models, pipeline logic, and runbooks to support team knowledge sharing
AWS & Infrastructure (not primary owner)
- Partner with AWS developers to ensure the health of the AWS infrastructure supporting the data pipeline (Firehose, Lambda, CloudWatch, etc.)
- Participante in managing monitoring and alerting using CloudWatch or equivalent tooling
- Participate in cost governance: monitor Snowflake credit consumption and AWS spend, flag anomalies
REQUIRED SKILLS & EXPERIENCE
Must Have
- Snowflake
- dbt (Core or Cloud)
- SQL (Advanced)
- CDC Concepts
- Openflow or equivalent
- AWS (Firehose / CW / S3)
- Dynamic Tables
- Dimensional Modeling
- Git Workflow
- Snowflake - Dynamic Tables, Streams, Tasks, clustering, query profiling, credit monitoring.
- dbt - model layering (staging / intermediate / mart), tests, macros, documentation, incremental strategies
- CDC - working knowledge of change data capture patterns; experience with Openflow, Debezium, Fivetran CDC, or similar
- SQL - advanced window functions, CTEs, performance tuning, consistent metric logic across multiple consumers
- Dimensional modeling - building and maintaining dims and facts from operational/CDC sources, SCD Type 2 awareness
- Monitoring mindset - you build observability and alerts for your own pipelines as a matter of course
- Experience supporting client-facing or product-embedded analytics