Overview
Job Details
Hiring: W2 Candidates Only
Visa: Open to any visa type with valid work authorization in the USA
Job Overview
We are seeking a highly skilled Senior AI-ML LLM Quality Engineer with strong expertise in testing and validating large language models (LLMs) and generative AI products. The ideal candidate will have hands-on experience in Python scripting, automation frameworks, and evaluation of LLMs within enterprise environments. This role involves building test strategies, executing model performance validations, and guiding customers on automation strategies.
Key Responsibilities
Support testing and validation of Large Language Model (LLM)-powered applications.
Design and implement test strategies, evaluation workflows, and automation frameworks for generative AI systems.
Perform model performance validation across diverse generative AI use cases.
Collaborate with cross-functional teams to ensure reliable, transparent, and scalable AI solutions.
Guide customers on automation strategy, relevant tools, and best practices.
Must-Have Skills
Strong experience in Python scripting, REST APIs, YAML.
Hands-on experience with testing Generative AI / ML products and evaluating LLMs in enterprise environments.
Experience with LLM Testing Tools (e.g., LangSmith, Promptfoo).
Strong understanding of LLM behavior and evaluation workflows.
Proficiency with PyTest, Selenium, or similar test automation frameworks.
Strong experience with testing automation and ability to advise customers on relevant technologies.
Nice-to-Have Skills
Experience with advanced testing frameworks.
Experience testing RAG pipelines and LLM agent systems.
Familiarity with LangChain, LlamaIndex, or Haystack.
Knowledge of AI/ML model evaluation metrics.
Experience with Red Teaming (preferred but not mandatory).
Familiarity with AWS cloud platforms and MLOps tooling (e.g., MLflow).
Ideal Candidate Profile
6 years of relevant AI/ML and software testing experience.
Deep understanding of LLM testing methodologies and automation strategies.
Strong problem-solving and communication skills to collaborate effectively with product and engineering teams.
Passion for ensuring AI model quality, transparency, and reliability in real-world enterprise applications.