Position: Data Platform Architect with Databricks & Supply Chain domain exp
Location: Remote
Role Summary:
We are looking for a senior Data Platform Architect who can envision and build an Enterprise Data Warehouse / Lakehouse platform for the supply chain domain from scratch. The role demands strong AI and ML acumen, with the explicit mandate to use AI as an accelerator across every phase of the SDLC — from discovery and design through build, test, deploy, and operate.
Hands-on experience with the Databricks Lakehouse Platform — building Bronze/Silver/Gold layers using Delta Lake and Delta Live Tables (DLT). Strong in PySpark, Spark SQL, and Databricks Workflows for orchestration. Proficient with Unity Catalog for governance, lineage, and access control, and comfortable using Databricks Asset Bundles for CI/CD. Working knowledge of Mosaic AI, Vector Search, and Genie spaces to deliver GenAI and natural-language BI on curated data products. Familiar with Databricks Assistant / Copilot to accelerate pipeline development across the SDLC.
Key Responsibilities:
Define the end-to-end vision, architecture, and roadmap for the supply chain Enterprise Data Platform.
Design and implement the EDW / Lakehouse from scratch — data modeling, ingestion, medallion architecture, governance, and consumption layers.
Integrate data from supply chain source systems including ERP, WMS, TMS, MES, procurement, and logistics platforms.
Embed AI, GenAI, and ML capabilities into the platform — including LLM-driven schema mapping, vector search, forecasting, and natural-language BI.
Drive AI-accelerated delivery across the SDLC using copilots, agents, and automation in discovery, design, build, test, deploy, and operate.
Establish standards for data quality, security, lineage, and governance.
Lead and mentor data engineers, modelers, and analytics teams; collaborate with business stakeholders to translate supply chain priorities into data products.
AI as an Accelerator Across the SDLC:
The candidate is expected to apply AI not as an add-on but as a delivery accelerator across every phase of the project lifecycle:
Discovery & Design - AI-assisted source profiling across ERP / WMS / TMS systems; auto-generated data dictionaries, schema inference, and conceptual models.
Data Modeling - LLMs propose conformed dimensions, supply-chain facts, business-glossary alignment and lineage drafts.
Pipeline Build - PySpark / SQL / DLT pipelines, IaC templates and reusable ingestion frameworks.
Testing & QA - Auto-generated unit tests, data-quality expectations, and synthetic test data via GenAI.
Deploy & Operate - Automated CI/CD and Asset Bundles, AI-driven job tuning, anomaly detection, and self-healing pipelines.
Consume & Insights - Natural-language BI, GenAI assistants, and forecasting on curated supply chain data products.
Must-Have Skills & Experience:
Experience: 10+ years in data engineering / architecture, with hands-on enterprise-scale platform delivery.
Domain: Strong supply chain understanding — plan, source, make, deliver, return.
Platforms: Databricks, Lakehouse / cloud data platforms (Azure, AWS, or Google Cloud Platform).
Modeling: Dimensional modeling, Data Vault, medallion architecture, and data product design.
Engineering: PySpark, SQL, Delta Live Tables, streaming and batch ingestion, CI/CD, and IaC (Terraform / Asset Bundles).
AI / ML: Practical experience with GenAI, LLMs, RAG, vector search, MLOps, and feature stores.
AI-Accelerated Delivery: Demonstrated use of copilots and agents to compress discovery, build, test, and operate cycles.
Governance: Unity Catalog or equivalent — lineage, access control, PII handling.
Leadership: Sets architectural vision, mentors teams, and engages confidently with business stakeholders.
Nice to Have:
Prior experience building a greenfield enterprise data platform end to end.
Exposure to SAP, Oracle EBS, or similar large-scale ERPs in a supply chain context.
Experience deploying AI agents or assistants on top of curated data products.
Expected Outcome:
A unified, AI-native supply chain data platform — designed from scratch, delivered faster through AI-accelerated SDLC practices, and governed by default.