Overview
Skills
Job Details
Databricks Data Engineer / Developer (U.S.C & G.C Only)
6+Months
Pleasanton, CA (Onsite 4 days a week)
Position Overview:
We are on the hunt for a Data Engineer with mastery in Databricks to play a pivotal role in constructing and refining high-scale data workflows and analytical architectures. This role entails collaborating with various teams to facilitate efficient data processing through the Databricks Lakehouse Platform hosted on Azure.
Essential Duties:
Engineer and execute ETL/ELT frameworks utilizing Databricks, Delta Lake, and Apache Spark
Join forces with data technologists, analysts, and business associates to ensure data integrity, clarity, and structured modeling
Craft and maintain data orchestration through Databricks functionalities such as Jobs, Notebooks, and Workflows
Fine-tune Spark engines for maximum performance and cost-effectiveness, along with guaranteed reliability
Supervise data processing infrastructures for high availability and quality assurance
Deploy and uphold continuous integration and deployment standards for Databricks notebooks and infrastructure coded practices (e.g., via Terraform, Databricks CLI)
Systematically document data processes, repositories, and operational methodologies
Uphold strict adherence to protocols concerned with data handling, privacy, and security regulations
Professional Requirements
Holding a BS or MS degree in Computer Science, Data Engineering, or analogous discipline
Accumulating at least 5 years of specialized knowledge in data engineering or related fields
Proven proficiency in Databricks and Apache Spark leveraging Python, Scala, or SQL capacities
Knowledgeable about Delta Lake, Unity Catalog, and contemporary data lake frameworks
Adeptness with cloud infrastructure (Azure, AWS, Google Cloud Platform), specifically with data management services (e.g., Amazon S3, Azure Data Lake Storage, Google BigQuery)
Acquaintance with CI/CD systems, source code management (Git), and orchestration utilities (Airflow, DB Workflows)
In-depth comprehension of data storage principles, query performance optimization, and expansive data operations
Desired Expertise
Practical familiarity with machine learning toolsets within Databricks, such as MLFlow and Feature Store
Acumen in data policy enactment technologies, for instance, Unity Catalog or Azure Purview
Proficiency in fusing BI utilities (Power BI, Tableau) with Databricks architectures
Holding Databricks-acknowledged credentials (Data Engineer Associate/Professional, Machine Learning, etc.)