Azure open AI Engineer

Overview

Remote
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 6 Month(s)

Skills

Azure OpenAI
Eval testing
LLMs

Job Details

Role: AI EVAL Engineer

Location: Bellevue, WA (Remote)

Duration: 6+ months

AI EVAL Engineering

Azure OpenAI; EVAL; Bench Marking

Required Skills

- Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation

- Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison

- Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance

- Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output)Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios

- Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration

- Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness)

- Ability to create datasets, test cases, benchmarks, and ground truth references for consistent scoring

- Test design and test automation experience, including reproducible evaluation pipelines

- Knowledge of AI safety, bias, security testing, and hallucination analysis

Nice-to-Have

- RAG evaluation experience

- Azure OpenAI

- OpenAI

- Anthropic

- Google AI platforms

- Performance benchmarking (speed, throughput, cost)

- Domain knowledge Office apps enterprise systems networking

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.