Key Responsibilities
• Champion Best Practices: Establish, document, and promote best-in-class approaches for data architecture, integration, and modelling.
• Pipeline Ownership: Oversee the design, development, and maintenance of robust data pipelines and data architectures that support large-scale, enterprise data needs.
• Drive Excellence: Initiate and manage efforts to improve data quality, operational efficiency, and process scalability.
• Data Governance: Consult on, design, and implement governance, security, and compliance strategies tailored to modern cloud data ecosystems.
• Communication: Communicate technical concepts and business value to diverse stakeholders, including executives, business leads, and technology teams.
• DevOps and Automation: Oversee the implementation of CI/CD practices with tools such as Azure DevOps, AWS Code Pipeline, Jenkins, TFS, or PowerShell for streamlined deployments and operations.
Qualifications
• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (Master’spreferred).
• 5+ years of hands-on experience in data engineering with a strong focus on Databricks, deployed on any major cloud (AWS, Azure, Google Cloud Platform).
• Minimum of 5 years Technical Proficiency:
o Expertise with Databricks and other cloud native databases, storage solutions, and distributed compute platforms.
o Deep understanding of Lakehouse architecture, Apache Spark, Delta Lake, and related big data technologies.
o Advanced skills in data warehousing and implementation experience with 3NF, dimensional modeling, and enterprise-level data lakes.
o Experience with Databricks components including Delta Live Tables, Autoloader, Structured Streaming, Databricks Workflows, and orchestration tools (e.g., Apache Airflow).
o Expertise in designing and supporting incremental data loads and building metadata-driven ingestion/data quality frameworks using PySpark.
o Hands-on experience with Databricks Unity Catalog and implementing fine-grained security and access control.
• Proven track record in deploying code and solutions via automated CI/CD pipelines.
• A minimum of 1 year experience leadership in managing complex, cross-functional data projects and technical teams.
• Experience with performance optimization of Data engineering pipelines, code, compute resource
Preferred:
• Comprehensive knowledge in 1 or more of AWS, Azure, and Google Cloud Platform cloud ecosystems and associated big data stacks is strongly preferred.
• Demonstrated skill in performance tuning and optimization within Databricks/Apache Spark environments.
• Stays current with the latest Databricks feature releases and platform enhancements.
• Exceptional communication and client interactions abilities
• Experience with Databricks Lakeflow is plus
• Experience in AI/ML is plus