Remote
•
Today
Objective Evaluate the linguistic accuracy and stylistic performance of three Large Language Models (LLMs) through targeted benchmarketing Pilot Phase To start with a set of 30 questions (1 replication) for Legal, then ramp to 200 questions per domain. Currently, 60 hours of work (30 questions with AHT of 12 hours per question). We will need 10 Legal resources to complete the pilot tasks. Task Overview: Participants will complete a series of True/False assessments designated to grade models
Easy Apply
Contract
25 - 30




