Apply Now

AI Experience Researcher, Product Evaluation, Vision Products Group

Boulder, CO, US • Posted 30+ days ago • Updated 1 hour ago

Full Time

On-site

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

Data Science
Systems Design
Innovation
Analytical Skill
Product Development
Dimensional Modeling
Psychology
Human-computer Interaction
HCI
User Experience
Analytics
FOCUS
Design Of Experiments
Data Analysis
Collaboration
Quality Assurance
Leadership
Communication
Writing
Artificial Intelligence
Machine Learning (ML)
Large Language Models (LLMs)
Automated Testing
Workflow
Evaluation
Research
Product Optimization
Science
Instructional Design
Reasoning

Summary

We are seeking a highly motivated and analytical AI Experience Researcher to join our team. This role blends cognitive and human sciences, data sciences, systems design, and product evaluation to ensure AI-powered products deliver exceptional and intuitive customer experiences.\\nYou will work alongside a small but impactful team, collaborating with ML and data scientists, software engineers, designers, project managers, and other cross-functional teams at Apple to define success criteria for AI experiences, and create rigorous evaluations that measure these criteria in iterative product development cycles. If you're passionate about applying scientific rigor to real-world problems, thrive on innovation, and want your work to impact hundreds of millions of users, this role offers an exceptional opportunity to make a lasting contribution to products people use every day.

The central challenge of this role is figuring out what \"good\" means for an AI experience, and then designing rigorous evaluations that measure those qualities reliably and at scale. This requires both deep theoretical grounding in human experience and a solid analytical mindset to operationalize that understanding into scalable evaluation frameworks.\nLeaning on research in human sciences, you will decompose complex AI interactions into their constituent parts, reason about how those parts interact, and build evaluation frameworks that hold up under the scrutiny of non-deterministic nature of AI experiences and the pressures of iterative product development. You will derive experimental designs, create golden data sets, write tests, and turn them into prompts for LLM judges or instructions for human raters. You will run automated evaluations, analyze results, and present findings to diverse stakeholders. \n\nCandidates who bring both quantitative rigor and a qualitative sensibility - to recognize patterns in model behaviors and outputs, and to develop an interpretive understanding of what the data is and isn't capturing from a human perspective - will thrive in this role.What matters most is the ability to hold both orientations at once - to think carefully about what makes an experience work, and to measure complex human dimensions with precision. We are also looking for someone who is excited to co-create what this discipline looks like going forward - bringing intellectual curiosity and a point of view about where human-centered AI evaluation should be headed.

Advanced degree in Cognitive Psychology, Human-Computer Interaction (HCI), User Experience (UX) Research, Learning Sciences, Learning Analytics, Psychometrics, Applied Behavioral Science, or a related field with a focus on human cognition, behavior, and empirical evaluation \nA strong data-driven mindset with experience designing and conducting rigorous empirical research or evaluation - including experimental design, data analysis, and interpretation of various qualitative and quantitative data - particularly in the context of complex human-system interactions \nAbility to reason from theoretical grounding about what makes an experience good in a given context, and to translate that reasoning into evaluation frameworks and measurement designs \nDemonstrated ability to operationalize research literature, qualitative user feedback, and quantitative behavioral data into actionable evaluation criteria, observable metrics, and product insights \nProficiency in data analysis and interpretation, with a strong understanding of statistical validity in evaluation contexts \nExceptional collaboration skills with a track record of working effectively in cross-functional teams that include engineering, ML, design, QA, leadership, and subject matter experts of diverse domains\nStrong communication skills, with the ability to translate complex research findings and evaluation results into clear, actionable recommendations for both technical and non-technical audiences

Familiarity with methods for capturing experiential quality beyond task success - such as cognitive interviews, think-aloud protocols, interaction analysis, or discourse and conversation analysis\nExperience designing and implementing automated evaluation pipelines, including writing prompts for LLM judges and constructing human-in-the-loop or multi-turn evaluation setups\nExperience working with multimodal or agentic systems, AI/ML models, preferably Large Language Models\nFamiliarity with automated testing frameworks and tooling\nExperience with data generation and annotation workflows, including curating datasets, scenarios, and tasks that represent realistic usage\nPortfolio demonstrating previous evaluation frameworks, research findings, or measurable contributions to product improvement\nBackground in learning sciences or instructional design, with experience reasoning about what makes a complex human experience effective is a plus

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90733111
Position Id: 8839885f6a0ba9b5789d8fc49bae6e49
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Researcher, AI Design

Remote or California

•

Today

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job Category User Experience Job Details About Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword - it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate abou

Full-time

USD 148,500.00 - 260,100.00 per year

Data Scientist 5 - AI Evals

Remote

•

Today

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what's next. Games are our next big frontier and an incredible opportunity for us to deliver new experiences to delight and entertain ou

Full-time

USD 372,000.00 - 600,000.00 per year

AI Evaluation Specialist | $70/hr Remote

Remote

•

29d ago

Position: AI Evaluation Specialist Type: Contract Compensation: $22 - $70/hour Location: Remote Commitment: 10-40 hrs/week Role ResponsibilitiesReview and critically assess AI-generated outputs for quality, clarity, usability, and overall user experience.Identify inconsistencies, weaknesses, and improvement opportunities across diverse content types and visual experiences.Apply structured evaluation guidelines and provide insightful, nuanced feedback to inform model improvements.Collaborate wit

Easy Apply

Third Party, Contract

$22 - $70

Machine Learning Engineer - LLM Evaluation & Automation

Remote

•

Today

We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, a

Full-time

Search all similar jobs

More jobs at Apple, Inc. in Boulder, CO