Job Description:
We are seeking an experienced Databricks Architect to help customers migrate from their existing Databricks version to version 10.4 and then migrate from 10.4 to the latest version.
Databricks runtime environment migration combines responsibilities of a Data Migration Specialist and a Databricks Engineer, focusing on transferring data, applications, and configurations to a new environment, and upgrading the Databricks Runtime (DBR) version.
Core Responsibilities:
Assessment and Planning: Conduct comprehensive assessments of existing data solutions and infrastructure (on-premises or legacy cloud) to define the scope and requirements for migration to the Databricks Lakehouse platform.
Environment Configuration: Design and configure target Databricks workspaces, ensuring proper setup of cluster sizes, DBR versions (preferably LTS versions), autoscaling policies, and security controls, including network isolation and access management using Unity Catalog.
Pipeline Development and Migration: Design, develop, and maintain scalable data pipelines (ETL/ELT processes) using Databricks tools and Apache Spark. This includes adapting existing Spark code, migrating notebooks, jobs, and MLflow artifacts to the new environment.
Data Migration and Integrity: Manage the process of moving data between source and target systems, ensuring data accuracy, integrity, and security throughout the transfer. This may involve converting data formats (e.g., Parquet to Delta Lake) and managing data loading processes like Auto Loader.
Testing and Validation: Plan and execute thorough testing of the migrated workloads and data in the new environment (development, testing, and production workspaces). This includes performance testing and validating data quality and consistency.
Troubleshooting and Optimization: Identify and resolve technical issues during the migration process. Tune Spark jobs, partitioning, and storage layouts for optimal performance and cost efficiency on the new DBR.
Documentation and Support: Create detailed migration plans, process documents, and provide support to end-users and application teams during the cutover phase.
Key Skills and Qualifications:
Technical Expertise: Strong experience with Databricks, Apache Spark, Delta Lake, SQL, Python, and potentially Scala or Java.
Cloud Platform (Hyper-scalar) Knowledge: Proficiency with cloud providers such as AWS, Azure, or Google Cloud Platform, and their respective services related to data and
infrastructure. Certifications would be a plus.
Data Governance: Experience with defining and rolling out data governance processes, policies, roles, and standards that ensures an organization's data is secure, accessible, usable, and compliant, establishing accountability for data quality and lifecycle management (creation, storage, use, disposal)
Migration Experience: Proven experience in data or cloud migration projects, specifically involving big data platforms or data warehouses.
Problem-Solving: Strong analytical and troubleshooting skills to address complex migration challenges and ensure data integrity.
Collaboration: Ability to work effectively within a team, collaborating with data owners, application teams, and program management.
Qualifications:
Education: Bachelor's or Master’s degree in Computer Science, Information Technology, or a related field.
Experience:
Minimum of 6 years of experience with Databricks implementations
Proven experience in Databricks migration projects
Soft Skills:
Excellent problem-solving and analytical skills.
Strong communication and leadership abilities.
Ability to work collaboratively in a fast-paced environment.
Preferred Qualifications:
Databricks Data Engineer and Architect Certifications
Experience with DevOps practices and tools.