Role: Sr AI Data Platform Engineer-Azure
Work Location: Bellevue, Frisco, Kansas, Atlanta
Only Locals in TX/WA/KS/GA
Job Description:
About the role
You'll join a working squad of senior engineers and architects to build the first production pilots of client serving layer. The team has already proven a sub-millisecond serving pattern (an embedded DuckDB cache fronted by FastAPI, kept fresh by Delta Change Data Feed sync from the Lakehouse) and is now scaling it across pilot use cases like ad suppression, subscriber lookups, and customer profile serving.
This is hands-on build work. You will write the sync pipelines, the serving stores, the APIs, the tests, and the CI/CD that take pilots from prototype to production.
What you'll do
Build serving stores, sync pipelines, and API layers for pilot use cases.
Configure each pilot end-to-end: source table binding, key schema, sync schedule, and consumer integration.
Set up CI/CD pipelines with automated tests covering sync correctness, API contract validation, and latency benchmarks.
Operate to defined SLAs for latency, freshness, and availability.
Partner with domain teams (Consumer & Marketing, Commercial & Revenue, Growth, Pricing & Analytics) to onboard their use cases onto the patterns we build.
Contribute to reference implementations, blueprints, and documentation that future teams will reuse.
Required experience
API development: production Python with FastAPI or comparable; versioned REST APIs, contracts, governance.
Batch and real-time data pipelines: Kafka or comparable streaming, plus CDC or incremental batch; built and operated end-to-end.
Caching and key-value serving: production Redis or Valkey; cache invalidation, TTL strategies, hot-path serving.
Vector databases and knowledge graphs: Pinecone, Weaviate, pgvector, Neo4j, or comparable; embeddings and retrieval patterns.
AI software engineering: hands-on building data infrastructure for AI and ML use cases (RAG, agent tooling, feature serving).
Azure Databricks, Delta Lake, Unity Catalog: hands-on production experience.
Delta Lake internals: transaction log, time travel, and Change Data Feed (CDF).
SQL and data modeling: comfortable with point-lookup vs analytical query patterns.
CI/CD: GitLab or GitHub Actions; automated tests for data pipelines.
Communication: works directly with senior architects, product managers, and domain stakeholders.
Nice to have
Embedded analytical engines: DuckDB or comparable.
Microsoft Fabric / OneLake / Power BI Semantic Models: production experience.
SLAs and SLOs: defining and operating for data products or APIs.
MCP-style tooling: data access for AI agents.
Enterprise-scale data serving: prior work on serving infrastructure at large enterprise