AI QA ENGINEER (AGENTIC & GENERATIVE)

Hybrid in Dallas, TX, US • Posted 2 days ago • Updated 2 days ago

Contract Corp To Corp

Contract Independent

Contract W2

No Travel Required

Hybrid

Depends on Experience

Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

Generative AI
QA
LLM

Summary

Agentic QA Engineer – Generative AI & Agentic Systems

(Agent & Multi-Agent Testing)

Summary

We are seeking a hands-on Agentic QA Engineer to design and execute end-to-end testing strategies for advanced agentic AI solutions, including multi-agent systems operating in production-grade environments.

In this role, you will partner closely with our Agentic Operations Team to ensure resiliency, reliability, accuracy, latency performance, orchestration correctness, and scalability across complex AI workflows. You will define QA frameworks, build reusable test artifacts, drive macro-level validations across multi-step agent systems, and lead the QA function for Agentic AI from development through production.

This is a high-impact role at the forefront of Generative AI systems engineering.

Key Responsibilities

Quality Strategy & Leadership

Define and own the QA strategy for agentic and multi-agent AI systems across development, staging, and production.
Establish testing standards, frameworks, coding guidelines for test harnesses, and review practices.
Mentor and guide QA engineers as the function scales.
Partner with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA across the SDLC and incident response lifecycle.
Define release criteria and operational readiness standards (performance, security, compliance, cost/latency budgets).
Build post-deployment validation playbooks and incident triage runbooks.

Agentic & Multi-Agent Testing

Design tests for:
- Agent orchestration
- Tool calling
- Planner-executor loops
- Inter-agent coordination (task decomposition, handoffs, convergence)
Validate:
- State management
- Context windows
- Memory/knowledge stores
- Prompt logic and graph correctness under variable conditions
Implement scenario fuzzing:
- Adversarial inputs
- Prompt perturbations
- Tool latency spikes
- Degraded API simulations

Reliability, Resiliency & Latency

Build resilience testing suites:
- Chaos experiments
- Failover validation
- Retry/backoff logic
- Circuit breaking
- Degraded-mode behavior
Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls, tool invocations, queues).
Ensure reliability through soak testing, canary verification, and automated rollback mechanisms.

Accuracy & Macro-Level Validation

Define ground-truth pipelines and evaluation frameworks for:
- Exact match
- Semantic similarity
- Factuality validation
Build macro-validation frameworks for complex, multi-step workflows (e.g., content generation + verification loops, agentic data pipelines).
Instrument guardrail validations for:
- Toxicity
- PII detection
- Hallucination detection
- Policy compliance

Scale & Orchestration

Design load and stress tests for multi-agent graphs under scale (concurrency, throughput, queue depth, backpressure).
Validate orchestrator correctness:
- DAG execution
- Retries
- Branching logic
- Timeouts
- Compensation paths
Engineer reusable test artifacts:
- Scenario configurations
- Synthetic datasets
- Prompt libraries
- Agent graph fixtures
- Simulators
Integrate automated testing into CI/CD pipelines (pre-merge gates, nightly runs, canary validation).
Tie production monitoring and alerting directly to AI system KPIs.

Required Qualifications

7+ years of experience in Software QA/Testing
2+ years working with AI/ML or LLM-based systems
Hands-on experience testing agentic or multi-agent architectures
Strong programming skills in Python or TypeScript/JavaScript
Experience building test harnesses, simulators, and reusable fixtures
Deep understanding of LLM evaluation:
- Exact/soft match
- BLEU/ROUGE
- BERTScore
- Embedding-based semantic similarity
- Prompt and guardrail testing
Expertise in distributed systems testing:
- Latency profiling
- Circuit breakers
- Retry patterns
- Chaos engineering
- Message queues
Familiarity with orchestration frameworks such as:
- LangChain
- LangGraph
- LlamaIndex
- DSPy
- OpenAI Assistants/Actions
- Azure OpenAI orchestration (or similar)
Experience with CI/CD (GitHub Actions, Azure DevOps)
Observability tooling (OpenTelemetry, PrometheGrafana, Datadog)
Feature flags and canary deployment strategies
Solid understanding of AI privacy, security, and compliance (PII handling, content policies, model safety)
Excellent communication skills and proven cross-functional leadership experience

Preferred Qualifications

Experience with multi-agent simulators and agent graph testing
Tooling latency emulation and failure simulation
Knowledge of MLOps practices (model versioning, evaluation pipelines, dataset governance)
Experience with A/B experimentation for LLM systems
Cloud experience (AWS), serverless architectures, containerization, and event-driven systems
Prior ownership of cost, latency, and SLA/SLO management for production AI workloads

Why This Role Matters

Agentic AI systems introduce new layers of complexity — orchestration, coordination, statefulness, and emergent behavior. This role ensures those systems are production-ready, resilient under stress, accurate at scale, and safe by design.

You won’t just test AI — you’ll define how agentic systems are validated at scale.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90987764
Position Id: 8885043
Posted 2 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

AI QA ENGINEER (AGENTIC & GENERATIVE)

Dice Job Match Score™

Job Details

Skills

Summary

Agentic QA Engineer – Generative AI & Agentic Systems

Summary

Key Responsibilities

Quality Strategy & Leadership

Agentic & Multi-Agent Testing

Reliability, Resiliency & Latency

Accuracy & Macro-Level Validation

Scale & Orchestration

Required Qualifications

Preferred Qualifications

Why This Role Matters

Similar Jobs