Data Pipeline Testing

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 12 Month(s)

Skills

Data Pipelines
ETL
BigQuery

Job Details

We are seeking a highly skilled and motivated Data Pipeline Testing Lead Engineer to join our team. This role is crucial for ensuring the accuracy and reliability of our data solutions on Google Cloud. The candidate will be responsible for testing data pipelines that involve technologies like Big Query, Kafka, Hive, Parquet files, and Snowflake. This position requires a deep understanding of both batch and streaming data processes and a proven ability to automate tests using Python.

*Key Responsibilities:

  • Design and execute tests on data pipelines that integrate various technologies such as BigQuery, Kafka, Hive, Parquet files, and Snowflake.
  • Develop automated tests for batch and streaming data systems to validate the functionality and performance of data pipelines.
  • Implement Data Quality Testing frameworks to ensure the integrity and accuracy of data stored and processed.
  • Collaborate with development teams to understand business requirements and translate them into test scenarios.
  • Verify the correctness of metrics calculations based on predefined rules and ensure compliance with data governance standards.
  • Collaborate with onsite and offshore engineering teams and product managers to ensure seamless integration and alignment with project objectives.
  • Troubleshoot and resolve issues within the data pipelines and related infrastructure.
  • Continuously improve testing strategies and automation frameworks to enhance test coverage and efficiency.
  • Document test results and collaborate with engineering teams to refine data solutions based on feedback.

*Required Skills and Qualifications:

  • Bachelor s degree in Computer Science, Information Technology, or a related field.
  • Minimum of 10 years of experience in data pipeline testing, preferably in a cloud environment.
  • Strong experience with Google Cloud Platform services, especially BigQuery.
  • Expertise with test data modelling and Quality Assurance for ETL processes.
  • Proficient in working with Kafka, Hive, Parquet files, and Snowflake.
  • Expertise in Data Quality Testing and metrics calculations for both batch and streaming data.
  • Excellent programming skills in Python and experience with test automation.
  • Strong analytical and problem-solving abilities.
  • Excellent communication and teamwork skills.

*Preferred Skills:

  • Experience with CI/CD pipelines in a cloud environment.
  • Knowledge of additional programming languages such as Java or Scala.