Overview
Skills
Job Details
Position: Sr. Databricks Lead engineer
Duration: 6+ months
Location: 100% Remote
Job Summary:
We are looking for a highly skilled and motivated Senior Data Engineer to lead the design and development of scalable, enterprise-grade data pipelines and frameworks within the Azure ecosystem. This role requires deep expertise in Databricks (PySpark/Scala), Delta Lake, real-time data streaming, and Azure integration tools. The ideal candidate will have hands-on experience implementing the Medallion architecture, ensuring data quality, lineage, and availability across multiple business systems, including Dynamics 365 CE & F&O.
Key Responsibilities:
1. Data Engineering & Pipeline Development
- Design, develop, and optimize robust batch and real-time data pipelines using Databricks (PySpark/Scala) and Delta Lake.
- Ingest structured and semi-structured data from diverse sources: APIs, FTP/SFTP, Dropbox, Event Hub, and Azure Data Factory.
- Build reusable ingestion and transformation frameworks to support scalable and modular pipeline architectures.
2. Medallion Architecture Implementation
- Architect and manage data flows using Databricks' Medallion architecture:
- Bronze Layer: Raw ingestion from APIs and external systems.
- Silver Layer: Cleaned, enriched, and validated data.
- Gold Layer: Curated datasets for analytics, reporting, and operational use.
3. Streaming & Real-Time Enablement
- Design and implement real-time data ingestion and processing pipelines using Structured Streaming, Event Hub, and Delta Live Tables.
- Enable near real-time data availability to Power BI, Dynamics 365 CE & F&O, and other downstream platforms.
4. Collaboration & System Integration
- Collaborate with data architects, API teams, and integration engineers to align pipelines with business logic and data models.
- Support integrations with third-party and internal systems such as Shelter LOBs, PetProPortal, Vetco, and SFCC.
5. Monitoring, Optimization & Governance
- Monitor pipeline performance and implement best practices for cost-efficiency, scalability, and fault tolerance.
- Establish observability, logging, and alerting using Azure Monitor and Databricks-native tools.
- Ensure adherence to data privacy, security, and governance policies through Unity Catalog, Azure Purview, and role-based access controls.
Required Skills & Experience:
- 8+ years of experience in data engineering and distributed data processing.
- 3+ years of hands-on development experience with Databricks, PySpark, and Delta Lake.
- Deep knowledge of:
- Structured Streaming, Azure Event Hub, Delta Live Tables
- Azure Data Factory, Logic Apps
- Lakehouse design patterns, especially Medallion architecture
Preferred Qualifications:
- Databricks Certified Data Engineer Associate/Professional
- Microsoft Certified: Azure Data Engineer Associate
- Experience with Unity Catalog, Azure Purview, and data governance best practices.
- Familiarity with DevOps practices, including deployment pipelines using Azure DevOps or similar tools.
Expected Outcomes:
- Implementation of scalable and reliable batch and streaming data pipelines to support enterprise-wide analytics and reporting needs.
- Delivery of high-quality, curated datasets aligned with business domains and operational KPIs.
- Enabling of real-time data consumption across enterprise applications and reporting systems.
- Establishment of a governed, observable, and efficient data infrastructure using best-in-class tools and practices.
Best Regards,
Chetna
-D
-Fax
Truth Lies in Heart