Apply Now

ML Research Engineer, AI Evaluation Platform

Washington, WA, US • Posted 19 hours ago • Updated 6 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Science
Production Engineering
Prototyping
Art
Artificial Intelligence
Python
PyTorch
JAX
TensorFlow
Software Engineering
Version Control
Testing
Debugging
Performance Tuning
Large Language Models (LLMs)
Collaboration
Spectrum
Cloud Computing
Docker
Kubernetes
Continuous Integration
Continuous Delivery
Distributed Computing
Apache Spark
Communication
Computer Science
Machine Learning (ML)
Modeling
LangSmith
Workflow
Publications
Open Source
Research
Generative Artificial Intelligence (AI)
Management
Economics
Evaluation
Technical Direction

Summary

AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function-it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction.\\n\\nWhat makes this team unusual is its interdisciplinary core. You will work alongside measurement scientists (psychometrics, validity theory), ML researchers, and platform engineers-bringing together ML research, statistical rigor, and production engineering. We are looking for an ML Research Engineer who can move fluidly across this landscape: someone who loves implementing the latest techniques in AI, has the engineering instincts to make them robust and scalable, and thrives at the intersection of research and production.

This is a combined research and engineering role, sitting with and between research/applied scientists and platform engineers. New evaluation research can be challenging to use at scale-that's where your skills in both machine learning and engineering come into play.\n\nOn the research side, you will partner with scientists to rapidly prototype their ideas, implement methods from recent papers, run large-scale experiments, and provide critical feedback grounded in your engineering experience. On the engineering side, you will work with platform engineers to bring those research prototypes into production-moving from Python packages on local machines to robust services deployed in the cloud.\n\nWhile past experience in research is not required, a desire to advance the state of the art in AI evaluation is. You should be ready to jump in across the full lifecycle of bringing new research into production at scale, speaking both the language of research and the language of engineering.

Bachelor's degree in Computer Science, Machine Learning, Software Engineering, or a closely related field (Master's preferred)\n2+ years of hands-on experience in a role combining machine learning and software engineering (e.g., ML engineer, research engineer, or applied scientist with strong engineering output), or a Master's degree in Computer Science, Machine Learning, or a closely related field with relevant project experience\nStrong proficiency in Python and the modern ML ecosystem (PyTorch, JAX, or TensorFlow), with demonstrated ability to implement complex methods from recent ML papers\nSolid software engineering fundamentals: clean code design, version control, testing, debugging, and performance optimization\nExperience working with large language models-whether fine-tuning, inference, prompting pipelines, or building LLM-powered applications\nDemonstrated ability to work across the research-to-production spectrum: you have taken experimental or prototype code and made it robust, scalable, and usable by others\nPractical experience with cloud-native development and deployment: containerization (Docker/Kubernetes), CI/CD pipelines, and distributed computing frameworks (e.g., Ray, Spark)\nStrong communication skills and comfort working in interdisciplinary teams, with the ability to engage productively with both researchers and platform engineers\nComfort with ambiguity and new problem spaces-you thrive when building something that doesn't yet have a playbook

Master's or Ph.D. in Computer Science, Machine Learning, or a related field\nExperience with evaluation-specific methods or frameworks: LLM-as-judge approaches, reward modeling, RLHF, calibration techniques, benchmark design, or human evaluation methodology\nFamiliarity with modern evaluation tools and frameworks (e.g., DeepEval, Ragas, TruLens, LangSmith) and an understanding of how to implement and scale model-based evaluation workflows\nTrack record of contributing to research outputs-co-authored publications, open-source contributions, or internal research reports-even if research is not your primary role\nExperience with the engineering challenges specific to generative AI and agentic systems: managing token economics, handling non-deterministic outputs, evaluating multi-turn agent trajectories and tool usage\nFamiliarity with statistical concepts relevant to evaluation: calibration, inter-rater reliability, scoring rules, or measurement validity\nExperience in fast-moving, early-stage teams where you helped define technical direction and engineering culture from the ground up

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90733111
Position Id: 9bdf75827b155d270233d4fe87213885
Posted 19 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Washington

•

Today

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a staff machine learning platform engineer to lead the architectural design and development of the high availability services and internal tools powering self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of new in

Full-time

Evaluation & Insights Machine Learning Engineer

Washington

•

Today

Imagine what you could do here. At Apple, great new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish! Are you passionate about music, movies, and the world of Artificial Intelligence and Machine Learning? So are we! Join our Human-Centered AI team for Apple Products. In this role, you'll represent the user perspective on new features, review and analyze d

Full-time

Senior/Staff Applied ML Engineer - AI/ML Evaluation & Simulation

Washington

•

Today

We're building the next generation of AI evaluation systems - and we're looking for a hands-on engineer who can bridge ML, software, and product to make AI systems more measurable, testable, and trustworthy.\\n\\nWe're part of the AI/ML Evaluation organization, seeking a Senior or Staff-level Applied ML Engineer with strong software engineering skills and a solid understanding of machine learning. In this hands-on role, you'll help design and build intelligent systems that simulate complex inter

Full-time

Machine Learning Platform Engineer, AI Evaluation Platform (All levels)

Washington

•

Today

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking machine learning platform engineers at multiple levels (Mid-Level to Principal) to architect and build high-availability services and internal tools that enable self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of

Full-time

Search all similar jobs

ML Research Engineer, AI Evaluation Platform

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs