Overview
Skills
Job Details
Role: We are seeking a seasoned Data Architect with deep
expertise in Databricks, Lakehouse architecture, and AI/ML/GenAI enablement to
lead a critical modernization initiative. The role involves transforming a
legacy platform into a future-ready, scalable, cloud-native Databricks-based
architecture. You will drive design and implementation of high-performance data
pipelines, orchestrate data workflows, and integrate AI/ML capabilities across
the stack to unlock real-time intelligence and innovation.
Key Responsibilities:
Lead the architectural modernization from an
on-prem/legacy platform to a unified Databricks Lakehouse ecosystem.
Architect and optimize data pipelines (batch and
streaming) to support AI/ML and GenAI workloads on Databricks.
Migrate and re-engineer existing Spark workloads to
leverage Delta Lake, Unity Catalog, and advanced performance tuning in
Databricks.
Drive integration of AI/ML models (including GenAI use
cases) into operational data pipelines for real-time decision-making.
Design and implement robust orchestration using Apache
Airflow or Databricks Workflows, with CI/CD integration.
Establish data governance, security, and quality
frameworks aligned with Unity Catalog and enterprise standards.
Collaborate with data scientists, ML engineers,
DevOps, and business teams to enable scalable and governed AI solutions.
Required Skills:
12%2B years in data engineering or architecture, with a
strong focus on Databricks (at least 4-5 years) and AI/ML enablement.
Deep hands-on experience with Apache Spark, Databricks
(Azure/AWS), and Delta Lake. Proficiency in AI/ML pipeline
integration using Databricks MLflow or custom model deployment
strategies. Strong knowledge of Apache Airflow, Databricks Jobs, and
cloud-native orchestration patterns. Experience with structured
streaming, Kafka, and real-time analytics frameworks.
Proven ability to design and implement cloud-native
data architectures.
Solid understanding of data modelling, Lakehouse
design principles, and lineage/tracking with Unity Catalog. Excellent
communication and stakeholder engagement skills.
Preferred Qualifications:
Certification in Databricks Data Engineering
Professional is highly desirable.
Experience transitioning from in house data platforms
to Databricks or cloud-native environments. Hands-on experience with
Delta Lake, Unity Catalog, and performance tuning in Databricks. Expertise
in Apache Airflow DAG design, dynamic workflows, and production
troubleshooting. Experience with CI/CD pipelines,
Infrastructure-as-Code (Terraform, ARM templates), and DevOps practices. Exposure
to AI/ML model integration within real-time or batch data pipelines.
Exposure to MLOps, MLflow, Feature Store, and model
monitoring in production environments. Experience with LLM/GenAI
enablement, vectorized data, embedding storage, and integration with Databricks
is an added advantage.