Overview
Hybrid 2d/week
Depends on Experience
Contract - W2
Contract - Independent
Unable to Provide Sponsorship
Skills
Data Engineer
Databricks
Python
ETL Testing
Job Details
Job Summary:
We are seeking a Data Engineer with strong hands-on experience in Databricks, Python, and ETL Testing to support our enterprise data initiatives. The ideal candidate will be responsible for designing, developing, and validating data pipelines and analytics workflows, ensuring data integrity, accuracy, and performance across large-scale distributed environments.
This role blends data engineering and data quality automation, leveraging Python (Pandas, PySpark) to perform functional, regression, and data validation testing of ETL workflows built in Databricks.
Required Skills and Experience:
- 3 6 years of hands-on experience in Data Engineering or ETL Testing roles.
- Strong proficiency in Python, including libraries such as Pandas, NumPy, and PyTest.
- Hands-on experience with Databricks (Azure or AWS) and PySpark for ETL development and validation.
- Solid understanding of data transformation, schema validation, and data quality assurance.
- Experience writing complex SQL queries for data validation and reconciliation.
- Working knowledge of Azure Data Factory or other orchestration tools.
- Familiarity with Delta Lake, Parquet, and distributed data storage concepts.
- Experience in version control and CI/CD practices (Git, Azure DevOps, Jenkins).
- Strong analytical and problem-solving skills with attention to detail.
Key Responsibilities:
- Design, build, and maintain scalable ETL pipelines using Azure Databricks and PySpark for ingestion, transformation, and loading of structured and semi-structured data.
- Develop and execute ETL test cases to validate data accuracy, transformation logic, and end-to-end data flow.
- Implement automated data validation frameworks using Python (Pandas, PyTest, Great Expectations, or similar tools).
- Collaborate with data architects, analysts, and business users to ensure high-quality data delivery for analytics and reporting.
- Perform data reconciliation and source-to-target validation between raw data and transformed layers.
- Optimize Databricks notebooks and Spark jobs for performance and cost efficiency.
- Implement CI/CD integration for ETL testing using Azure DevOps / GitHub Actions / Jenkins.
- Maintain data quality metrics and monitor ETL job performance and reliability.
- Work with Azure Data Factory, Delta Lake, and Azure Blob Storage for pipeline orchestration and data lake management.
- Document test cases, test results, and data lineage for audit and compliance.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.