Automated Data Quality Engineer

Overview

Remote
Depends on Experience
Full Time
No Travel Required

Skills

Automated Testing
Data Quality
Python
PyTest
AWS
ETL
SQL

Job Details

Habemco is a shared services company wholly owned and operated by the Habematolel Pomo of Upper Lake, a federally recognized Native American tribe located in Northern California. Habemco s support services such as product development and technology which are needed for business growth, ultimately power the Tribe s economy, and enable the delivery of education, health care and elder support programs for the Tribal community. Our talented team provides cross-functional support services to various tribal business and government entities. The Habemco team plays a critical role in ensuring a successful future for our customers, our employees, and the Tribe.

Headquartered in a beautiful, yet remote part of California, the Tribe recognizes that to compete in highly competitive industries such as FinTech, the Tribe must access expertise throughout the nation. In addition to employees that work remotely, the Tribe has employees located at its headquarters in Upper Lake, California and at a campus in Lenexa, Kansas.

Employees receive competitive pay and benets, quarterly performance bonuses and 401(k) with a 4% match. Our team is creative, forward-thinking, passionate and moves fast! Are you ready to grow with us?

 

Purpose of the Position:

We are seeking an experienced and detail-oriented Automated Data QA Engineer to join our dynamic data engineering team. As an Automated Data QA Engineer, you will play a critical role in ensuring the accuracy, consistency, and reliability of our data systems by designing and maintaining automated data validation scripts, executing data quality test cases, and identifying anomalies early in the data pipeline lifecycle. Your expertise in data testing, scripting languages, and cloud data technologies will be essential in optimizing the data quality assurance process and ensuring our data products meet the highest standards of integrity and usability. Incumbent performs work that is generally independent and collaborative in nature and contributes to moderately complex aspects of a project.

 

Key Responsibilities:

  • Develop and Maintain Automated Data Test Frameworks - Design and sustain automated testing frameworks for AWS Glue ETL processes, Lambda functions, and PySpark SQL, workows, leveraging PyTest and Spark sin-memory testing capabilities to ensure data reliability.
  • Design and execute test cases to ensure data accuracy, completeness, and consistency across AWS ELT pipelines and Data Lake components.
  • Ensure pipeline stability, low data latency, and robust error handling by testing and monitoring performance bottlenecks, collaborating with data engineers to enhance pipeline eciency.
  • Collaborate with Stakeholders - Partner with data engineers, data scientists, and business teams to grasp data pipeline requirements, ensuring test coverage supports data workows, quality benchmarks, and analytical goals.
  • Implement Unit and Integration Testing - Create and execute unit and integration tests for Glue ETL jobs, Lambda handlers, and PySpark data transformations, validating correctness, edge cases, and alignment with data processing needs.
  • Validate SQL Queries in PySpark - Test PySpark-embedded SQL logic by generating in- memory datasets, conrming query accuracy, performance, and resilience to edge cases like nulls or malformed data.
  • Integrate Testing into CI/CD Pipelines Working with Cloud Operations, incorporate automated test suites into CI/CD systems (e.g., AWS CodePipeline, GitHub Actions, Enterprise GitHub, AWS CodeDeploy), enforcing data validation checks and halting deployments if quality thresholds are not met.
  • Manage Test Data and Environments - Produce and maintain synthetic datasets for testing Glue ETL, Lambda, and PySpark pipelines, replicating production - like conditions while ensuring repeatability across test environments.
  • Leverage Data Quality Rules in AWS Glue and DQDL - Dene and implement AWS Glue Data Quality rules using the Data Quality Denition Language (DQDL) to validate ETL outputs, ensuring data completeness, consistency, and adherence to predened standards within test suites.
  • Monitor and Optimize Test Coverage - Leverage tools like pytest-cov to assess and enhance test coverage, pinpointing untested data pipeline segments and strengthening validation of critical logic.
  • Troubleshoot and Debug Test Failures - Investigate and resolve test failures in CI/CD pipelines, debugging issues in Glue ETL processes, Lambda executions, or PySpark queries to ensure data accuracy and performance.
  • Collaborate with Development Teams - Engage with data engineers to embed test-driven development (TDD) practices, enhance testability in pipeline designs, and provide quality- focused feedback during code reviews.
  • Document and Share Best Practices - Document testing frameworks and data validation strategies, sharing guidelines and training materials to promote consistent, high-quality testing across teams.

 

Education and Experience:

Required:

  • Bachelor of Science degree from an accredited university with a major in Computer Science, Data Engineering, or other Engineering discipline or, in lieu of education, 4 or more years of performing Automated Quality Assurance Engineer activities. Plus:
  • Proven experience of 4+ years as an SDET, Data QA Engineer, or similar role with a focus on testing data pipelines and systems.
  • Knowledge of continuous integration tools such as Jenkins, GitHub Actions or AWS CodePipeline, with experience integrating data testing workows.
  • Familiarity with version control systems (e.g., Git) and collaborative development practices in data-centric projects.
  • Strong understanding of data testing methodologies, tools (e.g., PyTest), and processes for ensuring data quality and integrity.
  • Ability to write clear, concise, and comprehensive test documentation and reports for data validation and pipeline testing outcomes.
  • Excellent problem-solving and debugging skills, with a focus on identifying data anomalies and pipeline failures.
  • Strong communication skills and the ability to work collaboratively with data engineers, analysts, and business stakeholders in a team environment.
  • Applicants for this position must have work authorization that does not now or in the future require sponsorship of a visa for employment authorization in the United States and with Habemco (e.g., H1-B visa, F-1 visa (STEM/OPT), TN visa.)
  • All oers are contingent upon signing a condentiality agreement and satisfactory completion of drug screening and background checks. Employer observes federal standards for controlled substances.

Preferred:

  • Master s degree from an accredited university with a major in Computer Science, Data Engineering, or another Engineering discipline.
  • Experience with cloud-based data services and testing in cloud environments.
  • Familiarity with Agile development methodologies (Scrum, Kanban, etc.).

 

Skills & Abilities:

  • Advanced problem-solving skills and the ability to optimize data for the best possible outcome.
  • Ability to prioritize and manage multiple milestones and projects eciently.
  • A willingness to dig deep, learn from others, share your own skills, and be part of a talented and dedicated team.
  • Superior attention to detail.
  • Highly adaptable, a driver of change, and capable of rallying teams quickly.
  • Eectively prioritizes and executes tasks in a highly productive yet autonomous environment.
  • Strong decision making and problem-solving skills (i.e., design, debugging, and testing) and experience with software development projects.
  • Ability to present technical ideas in concise, user-friendly, or a layperson s language.
  • Strong interpersonal skills used in developing eective working relationships and listening skills.
  • Ability to work in a fast-paced, time sensitive, and condential environment.
  • Excellent communication skills; can communicate eectively both orally and in writing with professionalism, excellent grammar, respect, and courteousness.
  • Possess a balance of assertiveness and diplomacy along with adaptability to communicate on all levels.

 

Physical Requirements:

  • Prolonged periods in a stationary seated position, such as working on a computer.
  • Ability to dierentiate wire and cable colors as well as various audible tones.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.