Job Title: Databricks Data Engineer with DevOps Skills
Location : Los Angeles CA (Hybrid)
Contract
Impelementation partner -********
End Client - Confidential
Experience - 10+
Job Summary
We are looking for an experienced Databricks Data Engineer with strong DevOps expertise to join our data engineering team. The ideal candidate will design, build, and optimize large-scale pipelines on the Databricks Lakehouse Platform on AWS, while driving automated CI/CD and deployment practices. This role requires strong skills in PySpark, SQL, AWS cloud services, and modern DevOps tooling. You will collaborate closely with cross-functional teams to deliver scalable, secure, and high-performance data solutions.
Must Demonstrate (Critical Skills & Architectural Competencies)
- Designing and implementing Databricks-based Lakehouse architectures on AWS
- Clear separation of compute vs. serving layers
- Ability to design low-latency data/API access strategies (beyond Spark-only patterns)
- Strong understanding of caching strategies for performance and cost optimization
- Data partitioning, storage optimization, and file layout strategy
- Ability to handle multi-terabyte structured or time-series datasets
- Skill in requirement probing, identifying what matters architecturally
- A player-coach mindset: hands-on engineering + technical leadership
Key Responsibilities
1. Data Pipeline Development
- Design, build, and maintain scalable ETL/ELT pipelines using Databricks on AWS.
- Develop high-performance data processing workflows using PySpark/Spark and SQL.
- Integrate data from Amazon S3, relational databases, and semi/nonstructured sources.
- Implement Delta Lake best practices including schema evolution, ACID, OPTIMIZE, ZORDER, partitioning, and file-size tuning.
- Ensure architectures support high-volume, multi-terabyte workloads.
2. DevOps & CI/CD
- Implement CI/CD pipelines for Databricks using Git, GitLab, GitHub Actions, or AWS-native tools.
- Build and manage automated deployments using Databricks Asset Bundles.
- Manage version control for notebooks, workflows, libraries, and environment configuration.
- Automate cluster policies, job creation, environment provisioning, and configuration management.
- Support infrastructure-as-code via Terraform (preferred) or CloudFormation.
3. Collaboration & Business Support
- Work with data analysts and BI teams to prepare curated datasets for reporting and analytics.
- Collaborate closely with product owners, engineering teams, and business partners to translate requirements into scalable implementations.
- Document data flows, technical architecture, and DevOps/deployment workflows.
4. Performance & Optimization
- Tune Spark clusters, workflows, and queries for cost efficiency and compute performance.
- Monitor pipelines, troubleshoot failures, and maintain high reliability.
- Implement logging, monitoring, and observability across workflows and jobs.
- Apply caching strategies and workload optimization techniques to support low-latency consumption patterns.
5. Governance & Security
- Implement and maintain data governance using Unity Catalog.
- Enforce access controls, security policies, and data compliance requirements.
- Ensure lineage, quality checks, and auditability across data flows.
Technical Skills
- Strong hands-on experience with Databricks, including:
- Delta Lake
- Unity Catalog
- Lakehouse Architecture
- Delta Live Pipelines
- Databricks Runtime
- Table Triggers
- Databricks Workflows
- Proficiency in PySpark, Spark, and advanced SQL.
- Expertise with AWS cloud services, including:
- S3
- IAM
- Glue / Glue Catalog
- Lambda
- Kinesis (optional but beneficial)
- Secrets Manager
- Strong understanding of DevOps tools:
- Git / GitLab
- CI/CD pipelines
- Databricks Asset Bundles
- Familiarity with Terraform is a plus.
- Experience with relational databases and data warehouse concepts.
Preferred Experience
- Knowledge of streaming technologies like Structured Streaming/Spark Streaming.
- Experience building real-time or near real-time pipelines.
- Exposure to advanced Databricks runtime configurations and performance tuning.
Certifications (Optional)
- Databricks Certified Data Engineer Associate / Professional
- AWS Data Engineer or AWS Solutions Architect certification