Overview
Skills
Job Details
TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years.
TECHNOGEN is a Small & Woman Owned Minority Business with GSA Advantage Certification. We have offices in VA; MD & Offshore development centers in India. We have successfully executed 100+ projects for clients ranging from small business and non-profits to Fortune 50 companies and federal, state and local agencies.
Hi,
We are looking to Hire a Talented Professional for the below Job opportunity with one of our clients,
Position: Data Tester (Databricks, PySpark, and Big Data) Position
Location: Remote
Duration: 12+ Months (Long-Term Contract)
Job Description:
- We are seeking an experienced Data Tester with strong expertise in Databricks, PySpark, and Big Data ecosystems.
- The ideal candidate will have a solid background in testing data pipelines, ETL workflows, and analytical data models, ensuring data integrity, accuracy, and performance across large-scale distributed systems.
- This role requires hands-on experience with Databricks, Spark-based data processing, and strong SQL validation skills, along with familiarity in data lake / Delta Lake testing, automation, and cloud environments (AWS, Azure, or Google Cloud Platform).
Required Qualifications:
- 8+ years of overall experience in data testing / QA within large-scale enterprise data environments.
- 5+ years of experience in testing ETL / Big Data pipelines, validating data transformations, and ensuring data integrity.
- 4+ years of hands-on experience with Databricks, including notebook execution, job scheduling, and workspace management.
- 4+ years of experience in PySpark (DataFrame APIs, UDFs, transformations, joins, and data validation logic).
- 5+ years of strong proficiency in SQL (joins, aggregations, window functions, and analytical queries) for validating complex datasets.
- 3+ years of experience with Delta Lake or data lake testing (schema evolution, ACID transactions, time travel, partition validation).
- 3+ years of experience in Python scripting for automation and data validation tasks.
- 3+ years of experience with cloud-based data platforms (Azure Data Lake, AWS S3, or Google Cloud Platform BigQuery).
- 2+ years of experience in test automation for data pipelines using tools like pytest, PySpark test frameworks, or custom Python utilities.
- 4+ years of Strong understanding of data warehousing concepts, data modeling (Star/Snowflake), and data quality frameworks.
- 4+ years of experience with Agile / SAFe methodologies, including story-based QA and sprint deliverables.
- 6+ years of experience in analytical and debugging skills for identifying data mismatches, performance issues, and pipeline failures.
Preferred Qualifications:
- Experience with CI/CD for Databricks or data testing (GitHub Actions, Jenkins, Azure DevOps).
- Exposure to BI validation (Power BI, Tableau, Looker) for verifying downstream reports.
- Knowledge of REST APIs for metadata validation or system integration testing.
- Familiarity with big data tools like Hive, Spark SQL, Snowflake, and Airflow.
- Cloud certifications (e.g., Microsoft Azure Data Engineer Associate or AWS Big Data Specialty) are a plus.
Best Regards,
Mohammad Kashif Ali