We are seeking a seasoned Data Architect with deep expertise in Databricks, Lakehouse architecture, and building robust data platforms for AI/ML/GenAI enablement and to lead a critical data modernization initiative. The role involves transforming a legacy data platform into a future-ready, scalable, cloud-native Databricks-based architecture. You will drive design and implement high-performance data pipelines, orchestrate data workflows, and integrate AI/ML capabilities across the stack to unlock real-time intelligence and innovation.
Key Responsibilities
Lead the architectural modernization from an on-prem/legacy platform to a unified Databricks Lakehouse ecosystem
Architect and optimize data pipelines (batch and streaming) to support AI/ML and GenAI workloads on Databricks
Migrate and re-engineer existing Spark workloads to leverage Delta Lake, Unity Catalog, and advanced performance tuning in Databricks
Drive integration of AI/ML models (including GenAI use cases) into operational data pipelines for real-time decision-making
Design and implement robust orchestration using Apache Airflow or Databricks Workflows, with CI/CD integration
Establish data governance, security, and quality frameworks aligned with Unity Catalog and enterprise standards
Collaborate with data scientists, ML engineers, DevOps, and business teams to enable scalable and governed AI solutions
Required Skills
12%2B years in data engineering or architecture, with a strong focus on Databricks (at least 4-5 years) and AI/ML enablement
Deep hands-on experience with Apache Spark, Databricks (Azure/AWS), and Delta Lake
Strong knowledge of Apache Airflow, Databricks Jobs, and cloud-native orchestration patterns
Experience with structured streaming, Kafka, and real-time analytics frameworks
Proven ability to design and implement cloud-native data architectures
Solid understanding of data modeling, Lakehouse design principles, and lineage/tracking with Unity Catalog
Excellent communication and stakeholder engagement skills.
Preferred Qualifications
Certification in Databricks Data Engineering Professional is highly desirable
Experience transitioning from in house data platforms to Databricks or cloud-native environments
Hands-on experience with Delta Lake, Unity Catalog, and performance tuning in Databricks
Expertise in Apache Airflow DAG design, dynamic workflows, and production troubleshooting
Experience with CI/CD pipelines, Infrastructure-as-Code (Terraform, ARM templates), and DevOps practices
Exposure to AI/ML model integration within real-time or batch data pipelinesExperience with LLM/GenAI enablement, vectorized data, embedding storage, and integration with Databricks is an added advantage.