Senior SDET / AI LLM || Remote || W2 Contract

  • Posted 11 hours ago | Updated 11 hours ago

Overview

Remote
$40 - $45
Contract - W2

Skills

SDET
Datadog
LLM
RAG
PyTorch
TensorFlow

Job Details

We are seeking a Senior Software Development Engineer in Test (SDET) with a strong background in test automation, backend systems testing, and AI/LLM validation.

This is a hands-on, highly influential role responsible for:

  • Testing LLM-powered applications used across the enterprise

  • Building LLM-driven testing and evaluation workflows

  • Defining organization-wide standards for GenAI quality, reliability, and release readiness


Key Responsibilities

LLM Testing & Evaluation

  • Design and implement test strategies for LLM-powered systems, including:

    • Prompt and response validation

    • Regression testing across model, prompt, and data changes

    • Evaluation of accuracy, consistency, hallucinations, bias, and safety

  • Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, LangChain, and Langflow

  • Develop synthetic and real-world test datasets in collaboration with the Data Engineer

  • Define quality thresholds, scoring mechanisms, benchmarks, and pass/fail criteria for GenAI systems


Test Automation & Framework Development

  • Build and maintain automated test frameworks for:

    • LLM APIs and services

    • Agentic workflows and RAG pipelines

    • Data ingestion and inference pipelines

  • Integrate LLM testing and evaluation into CI/CD pipelines, enforcing quality gates prior to production release

  • Partner with engineering teams to improve testability, reliability, and observability of AI systems

  • Perform root-cause analysis for failures related to model behavior, data quality, or orchestration logic


Observability & Monitoring

  • Instrument LLM applications using Datadog LLM Observability to track:

    • Latency, token usage, errors, and cost

    • Quality regressions, drift, and performance anomalies

  • Build dashboards and alerting focused on LLM quality and reliability

  • Use production telemetry to continuously refine test coverage and evaluation strategies


Shared Services & Collaboration

  • Act as a consultative partner to product, platform, and data teams adopting LLM technologies

  • Provide guidance on:

    • Generative AI test strategies

    • Prompt engineering and workflow validation

    • Release readiness and AI risk assessment

  • Contribute to organization-wide standards and best practices for testing, explaining, and monitoring AI systems

  • Participate in architecture and design reviews from a quality-first perspective


Engineering Excellence

  • Advocate for automation-first testing, infrastructure as code, and continuous monitoring

  • Drive adoption of Agile, DevOps, and CI/CD best practices within AI quality engineering

  • Conduct code reviews and promote secure, maintainable, and scalable test frameworks

  • Continuously improve internal tooling and frameworks within the QA Center of Excellence


Required Skills & Experience

  • Strong Python development skills

  • Experience testing backend systems, APIs, microservices, or distributed platforms

  • Proven experience building and maintaining automation frameworks

  • Ability to work effectively with ambiguous, non-deterministic systems


AI / LLM Experience

  • Hands-on experience testing or validating ML- or LLM-based systems

  • Familiarity with LLM orchestration and evaluation tools, including:

    • LangChain, Langflow

    • DeepEval, MLflow

  • Strong understanding of challenges unique to testing generative AI systems


Nice to Have

  • Experience with Datadog, especially LLM Observability

  • Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level)

  • Experience testing RAG pipelines, Vector Databases, or data-driven platforms

  • Background working in platform teams, shared services, or QA Centers of Excellence

  • Experience collaborating closely with Data Engineering or ML Platform teams

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.