Databricks Architect

Overview

On Site
Depends on Experience
Full Time

Skills

Databricks
Python
Azure
ML
SQL
Snowflake

Job Details

Job Title: Databricks Architect

Location: Milpitas, CA (On-site)

Requirements

  • We are seeking a hands-on Databricks Architect with deep experience in designing, implementing, and operating large-scale data platforms (work experience in the semiconductor manufacturing will be a big plus).
  • The ideal candidate has worked with the latest Databricks capabilities such as Unity Catalog, Delta Live Tables (DLT), and other Databricks features.
  • The candidate should have experience in data ingestion tools like Fivetran and HVR, with familiarity with real-time/streaming, batch, sensor/OT/ET data flows.
  • The candidate will architect end-to-end data solutions that support yield improvement, process control, predictive maintenance, and operational efficiency, with strict governance, security, and scalability.

Experience and JD:

  • 10+ years of overall and 5+ years of architecture experience with data architecture/data engineering roles with hands-on work on major enterprise data platforms.
  • Proven hands-on experience with Databricks, especially with modern features such as:
  • Unity Catalog: implementing catalog, schemas, permissions, external / managed tables, security, lineage, etc.
  • Delta Live Tables (DLT): building reliable pipelines, CDC, transformations, data quality, scaling/performance tuning.
  • Experience with data ingestion tools such as Fivetran for SaaS / ERP / relational sources, plus experience integrating HVR or equivalent for high velocity / change data capture or replication.
  • Strong working knowledge of cloud infrastructure (AWS or Azure), storage (object stores, data lakes), compute scaling, cluster management within Databricks.
  • Proficiency in programming with Python / PySpark, working with Spark / SQL; good understanding of streaming vs batch processing.
  • Deep understanding of data governance, security, compliance: role-based access control (RBAC), attribute-based, encryption, audit logs; handling data privacy; compliance requirements.
  • Operational excellence: reliability, monitoring, observability, metrics; experience with failover/backup / DR strategies.
  • Strong communication skills: able to work with domain experts and engineering teams, translate business requirements into technical solutions; document architecture and trade-offs.
  • Experience with performance tuning of Spark jobs, optimizing data storage formats, partitioning, and schema design to support high-throughput, low-latency workloads.

Nice to have

  • Experience with machine learning / predictive analytics, especially using Databricks MLflow, or integrating ML pipelines.
  • Experience with infrastructure as code (IaC) tools for provisioning data platform components, cluster policies, configurations (Terraform, Azure ARM / Bicep, AWS CloudFormation).
  • Knowledge of additional tools/frameworks: real-time streaming platforms (e.g., Kafka, Event Hubs), BI/dashboards, data catalog/lineage tools beyond Unity Catalog.
  • Experience with cost optimization in large data platforms: storage, compute, housekeeping (e.g., vacuuming, compaction, deleted file cleanup).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.