Senior Databricks Data Engineer

Hybrid in Dallas, TX, US • Posted 29 days ago • Updated 3 hours ago
Contract W2
No Travel Required
Able to Sponsor
Hybrid
Depends on Experience
Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

  • Databricks
  • SQL
  • Data Lake
  • Apache Spark
  • Scala
  • Python
  • PySpark

Summary

Responsibilities:

  • Responsible for leading the design, development, optimization, and deployment of large-scale robust data solutions and data pipelines within the Databricks Lakehouse platform. This role requires technical leadership, advanced performance tuning expertise and a strong understanding of cloud architecture and DevOps practices.
  • Lead the end-to-end design and implementation of scalable data pipelines and a medallion architecture (bronze, silver, gold layers) for structured and unstructured data.
  • Architect and implement complex ETL/ELT processes to process and transform large-scale data sets from diverse sources using Databricks notebooks, Delta Lake, and Apache Spark (PySpark, Scala, or SQL)
  • Optimize Spark jobs for performance and cost-efficiency using advanced techniques such as partitioning, caching, cluster configuration tuning, and troubleshooting bottlenecks.
  • Implement data security strategies, compliance standards, and governance frameworks like Unity Catalog, including managing access controls, encryption, and data lineage.
  • Champion code quality and implement DevOps (CI/CD) pipelines using tools like Azure DevOps, GitHub Actions, or Jenkins to automate deployment and management of data assets and infrastructure.
  • Develop and implement robust monitoring, logging, and altering systems for production grade pipelines to proactively identify and resolve issues.

Requirements:

  • 10+ years of experience in data engineering with 5+ years specifically in Databricks development and leadership roles.
  • 5 years of experience in optimizing Spark jobs for performance and cost efficiency using advanced techniques such as partitioning, caching, cluster configuration tuning, and troubleshooting bottlenecks.
  • Advanced deep knowledge in Scala, Python, and SQL for optimizing code and build high performance data pipelines on Databricks.
  • 5 years of experience in Scala, Spark and Databricks programming.
  • 5-10 years of experience implementing complex ETL/ELT processes to process and transform large scale datasets from diverse sources using Databricks notebooks, Delta Lake, and Apache Spark (PySpark, Scala, or SQL).
  • Must be able to design and develop CI/CD pipelines that can push the code to different development and prod environments, preferably to Azure using Git Actions.
  • Deep knowledge of the Apache Spark ecosystem, Delta Lake and Lakehouse architecture.
  • Hands-on experience with Azure and their relevant data services (Azure Data Lake Storage, Azure Data Factory)
  • Prefer Databricks Certified Data Engineer Professional certification.

Education:

  • Bachelor's degree in Computer Science/Engineering or a related field
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10120357
  • Position Id: Databricks0226
  • Posted 29 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

26d ago

Easy Apply

Full-time

50 - 55

Houston, Texas

25d ago

Easy Apply

Full-time

Depends on Experience

Dallas, Texas

6d ago

Easy Apply

Third Party, Contract

Dallas, Texas

Today

Easy Apply

Full-time

Depends on Experience

Search all similar jobs