Job Title: AWS Databricks Data Engineer
Job Location: Los Angeles CA (Hybrid)
Hire type: FTE / CTH
Note: Only Locals to California
Job Description
We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL, Python, PySpark, Data Warehousing, and Cloud-based ETL to join our data engineering team. The ideal candidate will design, implement, and optimize large-scale data pipelines, ensuring scalability, reliability, and high performance. This role requires close collaboration with cross-functional teams and business stakeholders to deliver modern, efficient data solutions.
Key Responsibilities
1. Data Pipeline Development
- Build and maintain scalable ETL/ELT pipelines using Databricks on AWS.
- Leverage PySpark/Spark and SQL to transform and process large, complex datasets.
- Integrate data from multiple sources including S3, relational/non-relational databases, and AWS-native services.
2. Collaboration & Analysis
- Partner with downstream teams to prepare data for dashboards, analytics, and BI tools.
- Work closely with business stakeholders to understand requirements and deliver tailored, high quality data solutions.
3. Performance & Optimization
- Optimize Databricks workloads for cost, performance, and efficient compute utilization.
- Monitor and troubleshoot pipelines to ensure reliability, accuracy, and SLA adherence.
- Apply query optimization, Spark tuning, and shuffle minimization best practices when handling tens of millions of rows.
4. Governance & Security
- Implement and manage data governance, access control, and security policies using Unity Catalog.
- Ensure compliance with organizational and regulatory data handling standards.
5. Deployment & DevOps
- Use Databricks Asset Bundles for deployment of jobs, notebooks, and configuration across environments.
- Maintain effective version control of Databricks artifacts using GitLab or similar tools.
- Use CI/CD pipelines to support automated deployments and environment setups.
Technical Skills (Required)
- Strong expertise in Databricks (Delta Lake, Unity Catalog, Lakehouse Architecture, Table Triggers, Workflows, Delta Live Pipelines, Databricks Runtime, etc.).
- Proven ability to implement robust PySpark solutions.
- Hands on experience with Databricks Workflows & orchestration.
- Solid knowledge of Medallion Architecture (Bronze/Silver/Gold).
- Significant experience designing or rebuilding batch heavy data pipelines.
- Strong background in query optimization, performance tuning, and Spark shuffle optimization.
- Ability to handle and process tens of millions of records efficiently.
- Familiarity with Genie enablement concepts (understanding required; deep experience optional).
- Experience with CI/CD, environment setup, and Git-based development workflows.
- Solid understanding of AWS cloud, including:
- IAM
- Networking fundamentals
- Storage integration (S3, Glue Catalog, etc.)
Preferred Experience
- Experience with Databricks Runtime configurations and advanced features.
- Knowledge of streaming frameworks such as Spark Structured Streaming.
- Experience developing real-time or near real-time data solutions.
- Exposure to GitLab pipelines or similar CI/CD systems.
Certifications (Optional)
- Databricks Certified Data Engineer Associate / Professional
- AWS Data Engineer or AWS Solutions Architect certification
Thanks & Regards
Akhil