Objective
Evaluate the linguistic accuracy and stylistic performance of three Large Language Models (LLMs) through targeted benchmarketing
Pilot Phase:
• To start with a set of 30 questions (1 replication) for Legal, then ramp to 200 questions per domain.
• Currently, 60 hours of work (30 questions with AHT of 1–2 hours per question).
• We will need 10 Legal resources to complete the pilot tasks.
Task Overview: Participants will complete a series of True/False assessments designated to grade models no writing style and technical precision.
Subject Matter Expertise: Evaluators must possess a deep understanding of domain-specific jargon to determine which model most accurate replicates industry standard terminology
Collaborative feedback: In addition to grading, rates may be invited to contribute to the evaluation framework by drafting new test questions and identifying critical performance categories for the future testing.
Disclaimer: i-Link Solutions Inc. provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. We especially invite women, minorities, veterans, and individuals with disabilities to apply. EEO/AA/M/F/Vet/Disability.