The Databricks Architect is responsible for designing, implementing, and optimizing scalable data analytics and data engineering solutions on the Databricks Lakehouse Platform. This role requires deep expertise in cloud platforms (Azure/AWS/Google Cloud Platform), distributed data processing, Delta Lake architectures, and modern data engineering practices. The architect will collaborate with cross-functional teams to define data strategies, ensure platform reliability, and enable advanced analytics, ML, and BI use cases.
Key Responsibilities
Architecture & Design
- Design end-to-end Databricks Lakehouse architectures for data ingestion, processing, storage, and consumption.
- Define and implement Delta Lake patterns, including medallion architecture (Bronze/Silver/Gold).
- Develop scalable data pipelines using PySpark, Spark SQL, and Databricks workflows.
- Architect solutions for structured, semi-structured, and unstructured data.
Engineering & Implementation
- Build robust ETL/ELT pipelines with Databricks notebooks, jobs, and workflows.
- Design and implement high-performance streaming solutions using Structured Streaming.
- Optimize Spark jobs for cost, performance, and scalability.
- Implement CI/CD and automation using Databricks Repos, Git, and DevOps pipelines.
Cloud & Platform Expertise
- Architect solutions across Azure/AWS/Google Cloud Platform leveraging native cloud services (e.g., Azure Data Factory, AWS Glue, Google Cloud Platform Dataflow).
- Ensure security, governance, and compliance through Unity Catalog, RBAC, and encryption.
- Monitor workloads and optimize cluster configurations for performance and cost.
Collaboration & Leadership
- Work closely with data engineers, data scientists, BI teams, and business stakeholders.
- Act as a subject matter expert (SME) for Databricks best practices, standards, and patterns.
- Conduct architectural reviews and guide teams on design decisions.
- Lead PoCs, evaluate new features, and drive platform adoption.
Quality, Governance & Observability
- Define standards for data quality, lineage, observability, and governance.
- Implement automated testing frameworks for pipelines and notebooks.
- Establish performance baselines and monitoring dashboards.
Required Skills & Experience
Technical Skills
- 7+ years of experience in data engineering/architecture.
- 3+ years of hands-on experience with Databricks.
- Strong expertise in Spark, PySpark, SQL, and distributed data processing.
- Deep understanding of Delta Lake features: ACID transactions, OPTIMIZE, ZORDER, Auto Loader.
- Experience with workflow orchestration, jobs, and Databricks REST APIs.
- Hands-on expertise with at least one cloud platform:
- Azure (preferred): ADF, ADLS, Key Vault, Event Hub, Azure DevOps
- AWS: S3, Glue, Lambda, Kinesis
- Google Cloud Platform: GCS, Dataflow, Pub/Sub
- Familiarity with CI/CD, Git, DevOps, and Infrastructure-as-Code (Terraform preferred).
Soft Skills
- Strong analytical and problem-solving skills.
- Excellent communication and stakeholder management.
- Ability to lead design discussions and guide technical teams.
- Strong documentation and architectural blueprinting skills.
Preferred Qualifications
- Databricks certifications, such as:
- Databricks Certified Data Engineer Professional
- Databricks Certified Machine Learning Professional
- Databricks Lakehouse Fundamentals
- Experience with MLflow, Feature Store, or MLOps workflows.
- Experience working in regulated industries (BFSI, healthcare, etc.).