Remote or Bellevue, Washington
•
Yesterday
AI EVAL Engineering Azure OpenAI; EVAL; Bench Marking Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluationExperience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performanceAbility to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency,
Easy Apply
Contract
50 - 55