AI QA Engineer (Agentic & Generative)

Remote • Posted 1 hour ago • Updated 1 hour ago

Contract W2

Contract Corp To Corp

Contract Independent

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

QA
Agentic
AI
LLM
Testing

Summary

Job Title: AI QA Engineer (Agentic & Generative)
Location: Remote
Duration: 12+ Months

Overview

We are seeking a highly skilled and hands-on AI QA Engineer to lead the design and execution of end-to-end testing strategies for cutting-edge agentic AI solutions, including multi-agent systems in production-grade environments.

In this role, you will partner closely with the Agentic Operations Team to ensure system resiliency, reliability, accuracy, latency, orchestration correctness, and scalability. You will own the QA function across the lifecycle—from development to production—while building robust frameworks and reusable testing assets for complex AI workflows.

Key Responsibilities

<>Quality Strategy & Leadership

Define and own the QA strategy for agentic and multi-agent AI systems across development, staging, and production.
Lead and mentor QA engineers; establish testing standards, coding guidelines, and review practices.
Collaborate cross-functionally with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA into SDLC and incident response.

<>Agentic & Multi-Agent Testing

Design and execute tests for:
- Agent orchestration
- Tool calling
- Planner-executor loops
- Inter-agent coordination (task decomposition, handoffs, goal convergence)
Validate:
- State management
- Context windows
- Memory/knowledge stores
- Prompt and graph correctness under varying conditions

<>Reliability, Resiliency & Latency

Implement scenario fuzzing (adversarial inputs, prompt perturbations, tool latency spikes, degraded APIs).
Develop resilience testing suites including:
- Chaos experiments
- Failover strategies
- Retries/backoff
- Circuit breakers
- Degraded mode behavior
Establish and measure latency SLOs across orchestration layers (LLMs, tools, queues).
Conduct soak tests, canary verifications, and automated rollback strategies.

<>Accuracy & Macro-Level Validation

Define ground truth and reference pipelines for task accuracy:
- Exact match
- Semantic similarity
- Factuality checks
Build macro-validation frameworks for multi-step agent workflows (e.g., data pipelines, content generation + verification loops).
Implement guardrail validations:
- Toxicity detection
- PII handling
- Hallucination checks
- Policy compliance

<>Scale & Orchestration

Design load and stress tests for multi-agent systems (concurrency, throughput, queue depth, backpressure).
Validate orchestrator correctness:
- DAG execution
- Retry logic
- Branching and timeouts
- Compensation paths

<>Dev → Prod Readiness

Build reusable test artifacts:
- Scenario configurations
- Synthetic datasets
- Prompt libraries
- Agent graph fixtures
- Simulators
Integrate testing into CI/CD pipelines (pre-merge gates, nightly runs, canary deployments).
Define release criteria and ensure operational readiness (performance, security, compliance, cost/latency).
Develop post-deployment validation playbooks and incident triage runbooks.

Required Qualifications

7+ years in Software QA/Testing, including 2+ years in AI/ML or LLM-based systems.
Hands-on experience testing agentic or multi-agent architectures.
Strong programming skills in Python or TypeScript/JavaScript.
Experience building test harnesses, simulators, and testing frameworks.
Expertise in LLM evaluation techniques:
- Exact/soft match
- BLEU, ROUGE, BERTScore
- Embedding-based semantic similarity
Strong background in distributed systems testing:
- Latency profiling
- Resiliency patterns (circuit breakers, retries)
- Chaos engineering
- Message queues
Familiarity with orchestration frameworks such as:
- LangChain, LangGraph, LlamaIndex, DSPy
- OpenAI Assistants/Actions or Azure OpenAI
Experience with:
- CI/CD (GitHub Actions, Azure DevOps)
- Observability tools (OpenTelemetry, PrometheGrafana, Datadog)
- Feature flags and canary deployments
Solid understanding of AI privacy, security, and compliance (PII, content policies, model safety).
Excellent communication and leadership skills with cross-functional collaboration experience.

Preferred Qualifications

Experience with multi-agent simulators and agent graph testing.
Knowledge of MLOps practices (model versioning, datasets, evaluation pipelines).
Experience with A/B testing and experimentation for LLM systems.
Background in cloud platforms (AWS), serverless architectures, and containerization.
Proven ownership of cost, latency, and SLA management for AI systems in production.

Why Join Us?

Work on cutting-edge agentic AI systems shaping the future of intelligent automation.
Own and define QA strategy for next-generation AI platforms.
Collaborate with top-tier engineering, data science, and operations teams.
Remote-first environment with high-impact, long-term opportunity.

Apply now to be part of building reliable, scalable, and intelligent AI systems at production scale.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90987764
Position Id: 8917624
Posted 1 hour ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote

•

Today

Must have: Agentic QA Engineer Generative AI & Agentic Systems (Agent, MultiAgent Testing) Summary We are seeking a hands-on AI Engineer to design and execute end-to-end testing strategies for agentic AI solutions, including multi-agent systems in production-grade environments. This role partners with the Agentic Operations Team to ensure resiliency, reliability, accuracy, latency, orchestration correctness, and scale. You will establish QA frameworks, build reusable test artifacts, drive macro-

Easy Apply

Contract

$60 - $65

Senior Agentic Ai Developer

Remote

•

Today

Senior Agentic Ai Developer 12 24 Months Phone and Skype Washington, DC/Local Remote DMV locals Key Responsibilities Design and implement agentic AI systems with planning, memory, and tool orchestration Build and maintain multi-agent architectures for collaborative problem solving Integrate LLMs with APIs, databases, vector stores, and enterprise platforms Develop agent reasoning pipelines (ReAct, tree-of-thought, reflection loops, etc.) Optimize prompt engineering, system instructions, and

Easy Apply

Third Party, Contract

Depends on Experience

Lead Agentic AI Engineer (Remote)

Remote or Toronto, Ontario

•

7d ago

We are seeking a Lead Agentic AI Engineer to design and build advanced autonomous AI agent systems that support large-scale enterprise initiatives within the healthcare and insurance domain. This role will focus on developing AI agents capable of reasoning, planning, and executing multi-step tasks, leveraging modern LLMs, orchestration frameworks, and enterprise data platforms. The ideal candidate will combine strong engineering expertise with deep knowledge of AI agent architectures, enabling

Easy Apply

Contract, Third Party

Depends on Experience

Agentic AI Engineer

Remote

•

Today

Role: Agentic AI Engineer Interview: 2 rounds of interview (Virtual) Onsite: Dallas, TX. (Can be Remote for suitable candidate) Rate: $55/hr on C2C Job Overview We are seeking a highly skilled Agentic AI Engineer to join our healthcare clients team. This is a fully remote opportunity to work on cutting-edge AI solutions, focusing on building intelligent, autonomous agents that enhance healthcare workflows and decision-making. The ideal candidate will have strong expertise in Python, LangGraph, F

Easy Apply

Contract

Search all similar jobs

AI QA Engineer (Agentic & Generative)

Dice Job Match Score™

Job Details

Skills

Summary

Overview

Key Responsibilities

Required Qualifications

Preferred Qualifications

Why Join Us?

Similar Jobs