Apply Now

Measurement Scientist, AI Evaluation Platform

Washington, WA, US • Posted 30+ days ago • Updated 6 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

Innovation
Bridging
SDK
Psychology
Design Of Experiments
Publications
Python
R
Statistics
Modeling
Communication
Artificial Intelligence
Evaluation
Machine Learning (ML)
Science
Generative Artificial Intelligence (AI)
Research

Summary

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or App Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something - you'll add something.

Our team, part of Apple Services Engineering, is building the scientific foundation for how AI systems are evaluated across Apple. We are seeking a Measurement Scientist to ensure that our evaluation methods are not just sophisticated, but scientifically valid and trustworthy . In this role, you will apply psychometric theory , validity frameworks, and statistical rigor to establish measurement standards for AI evaluation - ensuring that when we claim an evaluator measures \"helpfulness\" or \"safety ,\" it actually does. We are looking for individuals across a range of experience levels. \nThis role uniquely bridges measurement science and cutting-edge AI evaluation. You will develop methods for validating LLM-as-judge evaluators, automated benchmarks, and human evaluations. And you will create statistical tools that help engineers trust their evaluation results. You will work on an interdisciplinary team with ML researchers to solve new problems in AI evaluation. Your work will be both published at top measurement and ML venues and productionized into the evaluation SDK used across Apple. \nThe successful candidate will have deep expertise in psychometrics and measurement theory , with the ability to apply these principles to novel AI evaluation challenges. You will work collaboratively with ML researchers, platform engineers, and evaluation practitioners to translate measurement science into practical tools that scale across the organization.

PhD in Psychometrics, Educational Measurement, Quantitative Psychology , Statistics, or equivalent research/work experience.\nDeep expertise in modeling test data (IRT or related methods) and construct validation.\nStrong statistical foundation including experimental design, power analysis, sampling theory , and uncertaintyquantification.\nTrack record of designing and validating measurement instruments as demonstrated through publications or applied work.\nProficiency in Python (preferred) or R for statistical analysis, psychometric modeling, and method implementation.\nStrong working knowledge of generative AI technology\nExcellent communication skills with the ability to explain complex measurement concepts to engineers, ML researchers, and non-technical stakeholders.

Experience applying measurement science to AI/ML evaluation, automated scoring systems, or computational assessment.\nKnowledge of modern ML evaluation challenges including LLM-as-judge, automated metrics, benchmark design, and agentic systems.\nPublications at measurement venues or top ML conferences (NeurIPS, ICML, ICLR).\nExpertise in computational social or behavioral science using generative AI\nExperience collaborating with engineers to turn research methods into production tools and scalable infrastructure.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90733111
Position Id: f2bddae78c82d8e223321714970652ee
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Washington

•

Today

Apple Services Engineering (ASE) powers AI and LLM features across App Store, Music, Video, and more. As these systems increasingly rely on LLM Judges and automated evaluators to score model performance at scale, the trustworthiness of those evaluation signals becomes mission-critical. We believe that to build exceptional LLMs, you need exceptional mechanisms to validate the signals used to train and evaluate them.\\n\\n As a Principal Applied Scientist on the Human Centered AI team, you will b

Full-time

ML Research Engineer, AI Evaluation Platform

Washington

•

Today

AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function-it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction.\\n\\nWhat makes this team unusual is its interdisciplinary core. You will work alongside measurement sc

Full-time

Senior Applied Scientist - AI Evaluation & Quality Systems

Washington

•

Today

Apple Services Engineering (ASE) powers the AI and LLM features behind experiences that hundreds of millions of users love every day. As these systems increasingly rely on human-in-the-loop evaluation, the quality of our products is directly constrained by the quality of our evaluation systems. We believe that to build exceptional AI, you need exceptional mechanisms to validate the signals used to train and evaluate them. The Human-centered AI, Data Quality Operations team is looking for a Seni

Full-time

Evaluation & Insights Machine Learning Engineer

Washington

•

Today

Imagine what you could do here. At Apple, great new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish! Are you passionate about music, movies, and the world of Artificial Intelligence and Machine Learning? So are we! Join our Human-Centered AI team for Apple Products. In this role, you'll represent the user perspective on new features, review and analyze d

Full-time

Search all similar jobs

More jobs at Apple, Inc. in Washington, WA

Measurement Scientist, AI Evaluation Platform

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs