Overview
Skills
Job Details
Senior Data Engineer
Introduction:
As a Senior Data Engineer, you will be responsible for designing and implementing scalable data pipelines and lakehouse architectures using Databricks. You will lead the deployment and configuration of Databricks solutions on Azure, AWS, and optimize Spark jobs for efficient performance. Your role will also involve implementing Delta Lake, role-based access controls, and audit mechanisms to ensure reliable and performant data lakes. Additionally, you will evaluate emerging technologies and tools to enhance the Databricks ecosystem and continuously improve the performance, cost-efficiency, and scalability of data solutions.
Responsibilities:
- Design and implement scalable data pipelines and lakehouse architectures using Databricks
- Define best practices for data ingestion, transformation, and storage
- Architect Databricks solutions on Azure, AWS
- Lead deployment and configuration of Databricks including Unity Catalog
- Optimize Spark jobs and cluster performance
- Implement Delta Lake for reliable and performant data lakes
- Implement role-based access controls and audit mechanisms
- Evaluate emerging technologies and tools to enhance the Databricks ecosystem
- Continuously improve performance, cost-efficiency, and scalability of data solutions
Requirements:
Required Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
- 7+ years of experience in data engineering or architecture roles
- 3+ years of hands-on experience with Databricks and Apache Spark
- Strong proficiency in Python, SQL, and Spark
- Experience with Delta Lake, MLflow, and Unity Catalog
- Familiarity with CI/CD pipelines and DevOps practices in data environments
- Certifications in Databricks, Azure/AWS/Google Cloud Platform
- Experience with real-time data processing (Kafka, Structured Streaming)
- Knowledge of machine learning workflows and MLOps
Preferred Skills:
- Lakehouse Architecture Design
- Real-Time Media Content Analytics
- MLOps Pipeline Orchestration
- Delta Lake
- Unity Catalog
- MLflow
- Databricks SQL
- Apache Kafka
- Terraform