Job Title: Data & AI Solution Architect Databricks
Location: Philadelphia, PA
Work mode: Hybrid (3 days onsite must)
Skills & Experience:
Overall, 10-15 years of experience in Solution Architecture, Data Management, Data Lake and Lakehouse design and development.
Databricks (expert): Delta Lake, Unity Catalog, Lakeflow / Delta Live Tables, Databricks SQL, Photon, Serverless, Auto Loader, Databricks Apps, Vector Search
Apache Spark (expert): PySpark and Scala; internals - DAG execution, shuffle optimisation, memory tuning, adaptive query execution, Structured Streaming
AI / ML stack (advanced): MLflow (tracking, registry, serving, tracing), Feature Store, Model Serving, AutoML; production ML lifecycle end-to-end
GenAI & agents (proficient): RAG pipeline design, Databricks Agent Bricks and Agent Framework, Vector Search, LangChain, MLflow agent tracing, LLM integration (Claude, GPT)
Data engineering (advanced): dbt on Databricks, Lakeflow Jobs, Kafka / Structured Streaming, Fivetran, Airbyte - batch and real-time ingestion at enterprise scale
Cloud (advanced in one, working in others): AWS (S3, Glue, EMR, Step Functions), Azure (ADLS Gen2, ADF, Event Hubs), Google Cloud Platform (GCS, Dataflow, BigQuery)
Data modelling (advanced): Medallion architecture (Bronze / Silver / Gold), data vault 2.0, Kimball dimensional; open table formats (Delta Lake, Apache Iceberg, Apache Hudi)
Security & governance: Unity Catalog RBAC, column masking, row-level security, audit logs, private endpoints, SOX / GDPR / HIPAA compliance patterns
DevOps & IaC: Git, CI/CD for Databricks (Databricks Asset Bundles, GitHub Actions), Terraform Databricks provider, Databricks CLI
Orchestration: Lakeflow Jobs, Apache Airflow with Databricks operator, Prefect - dependency management, multi-task job design, retry and alerting patterns