AI EVAL Engineering

  • Bellevue, WA
  • Posted 1 day ago | Updated 1 day ago

Overview

Remote
On Site
50 - 55
Contract - Independent
Contract - W2
Contract - 06 Month(s)
No Travel Required
Unable to Provide Sponsorship

Skills

Azure OpenAI; EVAL; Bench Marking
Hands-on expertise in Eval testing
creating structured test suites to measure accuracy
relevance
safety
and performanceAbility to define and apply evaluation metrics (precisionrecall
BLEUROUGE
F1
hallucination rate
latency
cost per output)Prompt engineering and prompt testing experience across zero-shot
few-shot
and system prompt scenariosPythonother programming languages

Job Details

AI EVAL Engineering

Azure OpenAI; EVAL; Bench Marking

Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluationExperience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performanceAbility to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output)Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenariosPythonother programming languages, for automation, data analysis, batch evaluation execution, and API integrationExperience with evaluation toolsframeworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness)Ability to create datasets, test cases, benchmarks, and ground truth references for consistent scoringTest design and test automation experience, including reproducible evaluation pipelinesKnowledge of AI safety, bias, security testing, and hallucination analysis Nice-to-HaveRAG evaluation experienceAzure OpenAI OpenAI Anthropic Google AI platformsPerformance benchmarking (speed, throughput, cost)Domain knowledge Office apps enterprise systems networking

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.