Agentic QA Engineer – Generative AI & Agentic Systems
(Agent & Multi-Agent Testing)
Summary
We are seeking a hands-on Agentic QA Engineer to design and execute end-to-end testing strategies for advanced agentic AI solutions, including multi-agent systems operating in production-grade environments.
In this role, you will partner closely with our Agentic Operations Team to ensure resiliency, reliability, accuracy, latency performance, orchestration correctness, and scalability across complex AI workflows. You will define QA frameworks, build reusable test artifacts, drive macro-level validations across multi-step agent systems, and lead the QA function for Agentic AI from development through production.
This is a high-impact role at the forefront of Generative AI systems engineering.
Key Responsibilities
Quality Strategy & Leadership
Define and own the QA strategy for agentic and multi-agent AI systems across development, staging, and production.
Establish testing standards, frameworks, coding guidelines for test harnesses, and review practices.
Mentor and guide QA engineers as the function scales.
Partner with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA across the SDLC and incident response lifecycle.
Define release criteria and operational readiness standards (performance, security, compliance, cost/latency budgets).
Build post-deployment validation playbooks and incident triage runbooks.
Agentic & Multi-Agent Testing
Reliability, Resiliency & Latency
Build resilience testing suites:
Chaos experiments
Failover validation
Retry/backoff logic
Circuit breaking
Degraded-mode behavior
Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls, tool invocations, queues).
Ensure reliability through soak testing, canary verification, and automated rollback mechanisms.
Accuracy & Macro-Level Validation
Define ground-truth pipelines and evaluation frameworks for:
Exact match
Semantic similarity
Factuality validation
Build macro-validation frameworks for complex, multi-step workflows (e.g., content generation + verification loops, agentic data pipelines).
Instrument guardrail validations for:
Toxicity
PII detection
Hallucination detection
Policy compliance
Scale & Orchestration
Design load and stress tests for multi-agent graphs under scale (concurrency, throughput, queue depth, backpressure).
Validate orchestrator correctness:
DAG execution
Retries
Branching logic
Timeouts
Compensation paths
Engineer reusable test artifacts:
Scenario configurations
Synthetic datasets
Prompt libraries
Agent graph fixtures
Simulators
Integrate automated testing into CI/CD pipelines (pre-merge gates, nightly runs, canary validation).
Tie production monitoring and alerting directly to AI system KPIs.
Required Qualifications
7+ years of experience in Software QA/Testing
2+ years working with AI/ML or LLM-based systems
Hands-on experience testing agentic or multi-agent architectures
Strong programming skills in Python or TypeScript/JavaScript
Experience building test harnesses, simulators, and reusable fixtures
Deep understanding of LLM evaluation:
Expertise in distributed systems testing:
Latency profiling
Circuit breakers
Retry patterns
Chaos engineering
Message queues
Familiarity with orchestration frameworks such as:
Experience with CI/CD (GitHub Actions, Azure DevOps)
Observability tooling (OpenTelemetry, PrometheGrafana, Datadog)
Feature flags and canary deployment strategies
Solid understanding of AI privacy, security, and compliance (PII handling, content policies, model safety)
Excellent communication skills and proven cross-functional leadership experience
Preferred Qualifications
Experience with multi-agent simulators and agent graph testing
Tooling latency emulation and failure simulation
Knowledge of MLOps practices (model versioning, evaluation pipelines, dataset governance)
Experience with A/B experimentation for LLM systems
Cloud experience (AWS), serverless architectures, containerization, and event-driven systems
Prior ownership of cost, latency, and SLA/SLO management for production AI workloads
Why This Role Matters
Agentic AI systems introduce new layers of complexity — orchestration, coordination, statefulness, and emergent behavior. This role ensures those systems are production-ready, resilient under stress, accurate at scale, and safe by design.
You won’t just test AI — you’ll define how agentic systems are validated at scale.