We are seeking a highly skilled and experienced machine learning engineer to join AIML Evaluation to build the systems that evaluate and refine Apple's foundation models and agents. As a key member of the team, you will help design and develop benchmarks, evaluators, simulation environments, and prompt and context optimization pipelines that drive quality improvements across Apple's AI experiences. \\nYou will collaborate with product teams and the foundation model team to close the loop between observation and improvement, contributing datasets, environments, and reward signals that drive model and agent quality.
Our team builds the benchmarks, environments, and tooling that power model and agent refinement, and turns observations into actionable opportunities for the next model and agent iteration. We work across the full spectrum of evaluation: offline benchmarks, device-in-the-loop simulation, and on-device observation in production. We develop LLM-as-judge evaluators, train reward models calibrated against human feedback, optimize prompts and context for agents, and contribute targeted datasets and reward signals to foundation model post-training.\n\nIn this role, you will play a crucial role in designing and developing evaluation and refinement infrastructure that supports a broad range of AI products at Apple. \nYou will work on agent and model evaluation across offline, device-in-the-loop, and on-device settings; build automated prompt and context optimization pipelines; and partner with product and research teams to translate failure analysis into measurable model and agent improvements. \nYou will also have the opportunity to engage with product teams across Apple and contribute to advancements in large language models and agentic systems that will reach millions of users.\n\nTo succeed in this role, you should have a strong background in machine learning systems, distributed infrastructure, and a proven track record of building and maintaining ML evaluation or training infrastructure. \nYou should be a proactive problem solver with excellent communication skills and the ability to work effectively across multiple codebases, teams, and organizations. Experience with LLM evaluation, reward modeling, prompt optimization, or agentic systems is highly desirable.
Strong background in machine learning and distributed systems\nExperience building and maintaining ML infrastructure for evaluation, training, or deployment\nAbility to work effectively across multiple codebases, teams, and organizations\n8+ years of professional experience as a software engineer, preferably in machine learning or a related field\nBachelor's or Master's degree in Computer Science or a related field
Experience with LLM evaluation, LLM-as-judge, or reward modeling\nExperience with prompt optimization, agent harness development, or post-training (SFT, DPO, RLHF)\nProficiency in Python and ML frameworks such as PyTorch\nExperience with agentic systems, simulation environments, or trajectory-based data generation\nFamiliarity with on-device or privacy-preserving ML\nProactive and determined problem-solving skills\nExcellent communication skills
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 90733111
- Position Id: 48ce6808096ec1de19f4f08e5a2c4971
- Posted 10 hours ago