Overview
Skills
Job Details
Key Responsibilities
Design and implement end-to-end data architecture on Databricks (Spark, Delta Lake, MLflow).
Develop and optimize large-scale ETL/ELT pipelines using PySpark/SQL.
Architect data lakes and lakehouses integrating cloud storage systems (e.g., ADLS, S3, GCS).
Define and enforce best practices around security, data governance, and cost optimization.
Lead technical workshops and collaborate with data engineers, data scientists, and DevOps.
Implement data orchestration workflows (e.g., Databricks Workflows, Airflow, ADF).
Integrate Databricks with third-party BI tools (Power BI, Tableau) and data catalogs (Unity Catalog, Hive Metastore).
Provide architectural recommendations during pre-sales or stakeholder discussions (if consulting).
Required Skills and Qualifications
Strong experience with Databricks platform and Apache Spark.
Deep knowledge of Delta Lake, Unity Catalog, and Databricks Workflows.
Proficient in Python, SQL, and optionally Scala.
Hands-on with cloud platforms: Azure, AWS, or Google Cloud Platform.
Experience in data security, IAM, RBAC, and audit logging.
Proven expertise in designing data lakehouses, ETL/ELT workflows, and data modeling.
Experience with CI/CD and infrastructure as code (Terraform, GitOps).
Strong understanding of performance tuning and cost management.
Excellent communication and leadership skills.
Preferred Qualifications
Databricks Certified Data Engineer Professional or Architect-level certification.
Cloud certifications (e.g., Azure Solutions Architect, AWS Certified Data Analytics).
Experience with ML/AI on Databricks (MLflow, Feature Store, AutoML).
Exposure to data mesh, data products, or modern data stack concepts.