AI QA Engineer (Agentic & Generative)

Remote • Posted 1 hour ago • Updated 1 hour ago
Contract W2
Contract Corp To Corp
Contract Independent
No Travel Required
Remote
Depends on Experience
Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

  • QA
  • Agentic
  • AI
  • LLM
  • Testing

Summary

Job Title: AI QA Engineer (Agentic & Generative)
Location: Remote
Duration: 12+ Months


Overview

We are seeking a highly skilled and hands-on AI QA Engineer to lead the design and execution of end-to-end testing strategies for cutting-edge agentic AI solutions, including multi-agent systems in production-grade environments.

In this role, you will partner closely with the Agentic Operations Team to ensure system resiliency, reliability, accuracy, latency, orchestration correctness, and scalability. You will own the QA function across the lifecycle—from development to production—while building robust frameworks and reusable testing assets for complex AI workflows.


Key Responsibilities

<>Quality Strategy & Leadership
  • Define and own the QA strategy for agentic and multi-agent AI systems across development, staging, and production.

  • Lead and mentor QA engineers; establish testing standards, coding guidelines, and review practices.

  • Collaborate cross-functionally with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA into SDLC and incident response.

<>Agentic & Multi-Agent Testing
  • Design and execute tests for:

    • Agent orchestration

    • Tool calling

    • Planner-executor loops

    • Inter-agent coordination (task decomposition, handoffs, goal convergence)

  • Validate:

    • State management

    • Context windows

    • Memory/knowledge stores

    • Prompt and graph correctness under varying conditions

<>Reliability, Resiliency & Latency
  • Implement scenario fuzzing (adversarial inputs, prompt perturbations, tool latency spikes, degraded APIs).

  • Develop resilience testing suites including:

    • Chaos experiments

    • Failover strategies

    • Retries/backoff

    • Circuit breakers

    • Degraded mode behavior

  • Establish and measure latency SLOs across orchestration layers (LLMs, tools, queues).

  • Conduct soak tests, canary verifications, and automated rollback strategies.

<>Accuracy & Macro-Level Validation
  • Define ground truth and reference pipelines for task accuracy:

    • Exact match

    • Semantic similarity

    • Factuality checks

  • Build macro-validation frameworks for multi-step agent workflows (e.g., data pipelines, content generation + verification loops).

  • Implement guardrail validations:

    • Toxicity detection

    • PII handling

    • Hallucination checks

    • Policy compliance

<>Scale & Orchestration
  • Design load and stress tests for multi-agent systems (concurrency, throughput, queue depth, backpressure).

  • Validate orchestrator correctness:

    • DAG execution

    • Retry logic

    • Branching and timeouts

    • Compensation paths

<>Dev → Prod Readiness
  • Build reusable test artifacts:

    • Scenario configurations

    • Synthetic datasets

    • Prompt libraries

    • Agent graph fixtures

    • Simulators

  • Integrate testing into CI/CD pipelines (pre-merge gates, nightly runs, canary deployments).

  • Define release criteria and ensure operational readiness (performance, security, compliance, cost/latency).

  • Develop post-deployment validation playbooks and incident triage runbooks.


Required Qualifications

  • 7+ years in Software QA/Testing, including 2+ years in AI/ML or LLM-based systems.

  • Hands-on experience testing agentic or multi-agent architectures.

  • Strong programming skills in Python or TypeScript/JavaScript.

  • Experience building test harnesses, simulators, and testing frameworks.

  • Expertise in LLM evaluation techniques:

    • Exact/soft match

    • BLEU, ROUGE, BERTScore

    • Embedding-based semantic similarity

  • Strong background in distributed systems testing:

    • Latency profiling

    • Resiliency patterns (circuit breakers, retries)

    • Chaos engineering

    • Message queues

  • Familiarity with orchestration frameworks such as:

    • LangChain, LangGraph, LlamaIndex, DSPy

    • OpenAI Assistants/Actions or Azure OpenAI

  • Experience with:

    • CI/CD (GitHub Actions, Azure DevOps)

    • Observability tools (OpenTelemetry, PrometheGrafana, Datadog)

    • Feature flags and canary deployments

  • Solid understanding of AI privacy, security, and compliance (PII, content policies, model safety).

  • Excellent communication and leadership skills with cross-functional collaboration experience.


Preferred Qualifications

  • Experience with multi-agent simulators and agent graph testing.

  • Knowledge of MLOps practices (model versioning, datasets, evaluation pipelines).

  • Experience with A/B testing and experimentation for LLM systems.

  • Background in cloud platforms (AWS), serverless architectures, and containerization.

  • Proven ownership of cost, latency, and SLA management for AI systems in production.


Why Join Us?

  • Work on cutting-edge agentic AI systems shaping the future of intelligent automation.

  • Own and define QA strategy for next-generation AI platforms.

  • Collaborate with top-tier engineering, data science, and operations teams.

  • Remote-first environment with high-impact, long-term opportunity.


Apply now to be part of building reliable, scalable, and intelligent AI systems at production scale.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90987764
  • Position Id: 8917624
  • Posted 1 hour ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Contract

$60 - $65

Remote

Today

Easy Apply

Third Party, Contract

Depends on Experience

Remote or Toronto, Ontario

7d ago

Easy Apply

Contract, Third Party

Depends on Experience

Remote

Today

Easy Apply

Contract

50

Search all similar jobs