Data Engineer – Data Quality & Validation

Hybrid in Dallas, TX, US • Posted 9 hours ago • Updated 8 hours ago
Contract W2
Contract Independent
24 Months
No Travel Required
Hybrid
50+
Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

  • AWS
  • Kafka
  • Databricks
  • SQL
  • and Python

Summary

Data Engineer – Data Quality & Validation

Location: Dallas, TX (Hybrid – 3 Days Onsite)
Job Type: Long-Term Contract
Employment Type: W2 Only
Interview Process: In-Person Client Interview (Mandatory)

Position Overview

We are seeking an experienced Data Engineer – Data Quality & Validation to support enterprise-scale data platforms and pipelines by ensuring the accuracy, completeness, reliability, and performance of data assets across the organization. This role will focus on validating both batch and real-time data processing solutions built on Databricks, Apache Spark, Kafka, AWS, SQL, and Python.

The ideal candidate will have a strong background in data engineering, ETL/ELT validation, data quality assurance, automation, and testing of distributed data systems. The candidate will work closely with data engineers, architects, business stakeholders, and platform teams to establish robust validation frameworks and maintain high data quality standards.


Key Responsibilities

Data Quality & Validation

  • Validate data pipelines to ensure accuracy, completeness, consistency, and timeliness of data.
  • Perform source-to-target reconciliation across multiple systems and platforms.
  • Develop and execute SQL-based data validation checks and business rule validations.
  • Ensure data lineage, traceability, and auditability throughout the data lifecycle.
  • Identify, investigate, and resolve data quality issues and anomalies.
  • Define and monitor data quality metrics, KPIs, SLAs, and SLOs.

ETL / ELT Pipeline Validation

  • Validate data ingestion, transformation, aggregation, and consumption layers.
  • Test batch and real-time streaming data pipelines.
  • Verify business transformation logic using SQL, PySpark, and Python.
  • Validate historical data loads, backfills, and reprocessing activities.
  • Conduct end-to-end testing of data movement across enterprise systems.
  • Ensure data consistency across upstream and downstream platforms.

Databricks & Apache Spark Testing

  • Validate data processing workflows running on Databricks.
  • Test Spark-based workloads developed using PySpark and Spark SQL.
  • Verify large-scale data transformations, aggregations, and calculations.
  • Support testing and validation of distributed processing environments.
  • Analyze Spark execution behavior and data processing outcomes.

Kafka & Streaming Data Validation

  • Validate Kafka-based streaming architectures and data pipelines.
  • Test producer and consumer workflows across distributed systems.
  • Verify message ordering, delivery guarantees, and data integrity.
  • Validate schema evolution, retention policies, partitions, and offset management.
  • Test serialization formats including Avro, JSON, and Protobuf.
  • Simulate and validate duplicate records, late-arriving events, and failure scenarios.
  • Ensure resiliency and reliability of event-driven processing pipelines.

Automation & Test Framework Development

  • Design and develop Python-based automation frameworks for data validation.
  • Build reusable testing utilities and validation components.
  • Create synthetic datasets and test scenarios to support validation efforts.
  • Integrate automated testing into CI/CD pipelines.
  • Develop automated monitoring and alerting solutions for data quality issues.
  • Improve testing efficiency through automation and reusable frameworks.

Performance, Reliability & Observability

  • Validate throughput, scalability, latency, concurrency, and overall system performance.
  • Test retry mechanisms, recovery processes, and idempotent workflows.
  • Conduct regression, failover, resilience, and performance testing.
  • Validate monitoring, logging, metrics, and observability solutions.
  • Support incident investigations, root cause analysis, and remediation efforts.
  • Ensure compliance with operational and data governance standards.

Required Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
  • 7+ years of experience in Data Engineering, Data Quality Engineering, QA Engineering, SDET, or related disciplines.
  • 4+ years of hands-on experience with enterprise data platforms and large-scale data pipelines.
  • 3+ years of hands-on experience with Databricks and Apache Spark.
  • Strong SQL expertise for data validation, reconciliation, profiling, and analysis.
  • Strong Python programming skills for automation and data validation frameworks.
  • Experience testing ETL/ELT pipelines in both batch and streaming environments.
  • Hands-on experience with Kafka or similar event-streaming platforms.
  • Experience working with AWS data services, including:
    • Amazon S3
    • AWS Glue
    • AWS Lambda
    • Amazon EMR
    • Amazon Redshift
    • Amazon Athena
  • Experience working with distributed data processing systems and cloud-based data platforms.
  • Strong analytical, troubleshooting, and problem-solving abilities.
  • Excellent verbal and written communication skills.
  • Ability to collaborate effectively with cross-functional teams.

Preferred Qualifications

  • Experience with data quality and observability tools such as:
    • Great Expectations
    • Monte Carlo
    • Similar data quality platforms
  • Knowledge of schema registries, metadata management, and data contracts.
  • Experience integrating automated testing into CI/CD pipelines using:
    • GitHub Actions
    • Jenkins
    • Similar DevOps platforms
  • Experience supporting modern cloud-native data engineering ecosystems.
  • Understanding of Data Lakehouse architectures and distributed computing frameworks.
  • Familiarity with data governance, lineage, and compliance best practices.
  • Experience with Agile/Scrum delivery methodologies.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91174352
  • Position Id: 8986941
  • Posted 9 hours ago
Contact the job poster
MK

Madhava Krishna Naini

Recruiter @ Plugins Inc
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Hybrid in Dallas, Texas

11d ago

Easy Apply

Contract, Third Party

Depends on Experience

Hybrid in Dallas, Texas

Today

Easy Apply

Full-time

Depends on Experience

Remote or Dallas, Texas

Today

Easy Apply

Contract

$DOE

Hybrid in Dallas, Texas

7d ago

Easy Apply

Full-time

90,000 - 100,000

Search all similar jobs