Overview
Skills
Job Details
The AI/ML Test Engineer will serve as both a technical expert and consultative leader within their team, responsible for establishing and executing rigorous testing methodologies for software-based AI and machine learning solutions. These responsibilities includes developing best practices, defining test frameworks, validating performance metrics, and implementing and validating human-in-the-loop (HITL) evaluation processes to ensure they are properly functioning within established quality and ethical standards. This team will be the first of its kind within a large enterprise insurance company and will play a key role in establishing foundational AI/ML best practices and procedures, so candidates with expertise working in 'greenfield' environments are preferred.
Key Responsibilities
- Define and implement testing frameworks and validation strategies for AI and machine learning applications
- Establish best practices for AI model testing, including performance, bias, accuracy, interpretability, and reliability measures
- Perform hands-on testing, model validation, and results analysis for LLMs and AI-driven systems
- Incorporate and validate human-in-the-loop (HITL) methodologies to ensure proper implementation and alignment with ethical and quality standards
- Develop automated testing workflows for reproducible model evaluation and regression analysis
- Collaborate with data scientists, AI engineers, and DevOps teams to integrate quality validation throughout the AI development lifecycle
- Analyze and document test results, generating actionable insights and improvement recommendations
- Advise leadership and engineering teams on AI testing governance, risk mitigation, and continuous improvement strategies
- Ensure compliance with internal standards and evolving AI governance and ethics practices
Required Skills & Experience
- Bachelor's or Master's degree in Computer Science, Data Science, or a related field
- 8+ years of professional experience in software quality engineering, with at least 3+ years focused on AI or ML systems
- Deep understanding of AI/ML model development, evaluation metrics, and data validation techniques
- Hands-on experience with testing tools and frameworks for Python, PyTest, and model evaluation environments
- Experience testing solutions built on LLMs and generative AI frameworks
- Knowledge of statistical validation, drift detection, A/B testing, and bias analysis
- Strong understanding of AI ethics, governance, and human-centered testing principles
- Excellent analytical and problem-solving skills, with the ability to communicate findings to both technical and business audiences
Preferred Experience
- Familiarity with LangChain, MLflow, TensorFlow Extended (TFX), or MLOps testing pipelines
- Demonstrated success establishing AI/ML testing best practices in enterprise environments