ML Engineer, Proactive - Agentic Systems Evaluation

Cupertino, CA, US • Posted 2 days ago • Updated 3 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

  • Data Collection
  • Analytical Skill
  • Dashboard
  • Reporting
  • Cross-functional Team
  • Generative Artificial Intelligence (AI)
  • Computer Science
  • Statistics
  • Science
  • Software Engineering
  • Python
  • Privacy
  • Testing
  • Reasoning
  • Prompt Engineering
  • Evaluation
  • Swift
  • Machine Learning (ML)
  • Publishing
  • Research

Summary

Are you passionate about working on the next generation of personalized intelligence systems? In this role, you will be developing and deploying robust evaluation frameworks across the data lifecycle -- from data collection and processing, to analytic dashboards for reporting. You will be part of the larger Proactive Intelligence team, which builds features that anticipate customer's needs and create personalized experiences by adapting to user behaviors with machine learning running locally on-device or in PCC. Join our cross functional team of specialists dedicated to the evaluation of agentic systems.

We are looking for a high-impact ML Evaluation Engineer to help architect rigorous evaluations systems for autonomous agents. With the rise of generative AI, the ability to quantify the reliability and quality of these systems is more critical than ever. You will design and deploy qualitative and quantitative metrics to measure the quality, reasoning, and tool-use accuracy of agentic systems. You will be working with very sensitive data, so leveraging existing and developing new privacy enhancing technologies -- such as differential privacy, PII redaction, and data minimization -- will be crucial. The team you will be joining is focused on advancing scalable automated processes for evaluation. To succeed, you will need a deep understanding of system-level software operations to deliver next-generation capabilities. Join the Proactive Intelligence team to build the evaluation platforms for the future of intelligent, personalized experiences.

MS or PhD in Computer Science, Machine Learning, Statistics, or equivalent practical experience in a quantitative field.\n3+ years of industry experience in ML Engineering or Applied Science.\nStrong software engineering fundamentals (Python is a must) with experience building scalable, automated data or evaluation pipelines.

Demonstrated experience applying Differential Privacy, Federated Learning, or advanced PII redaction techniques to large-scale datasets.\nHands-on experience building or testing LLM-based systems, including a deep understanding of chain-of-thought reasoning, prompt engineering, and agentic planning.\nProficiency in building or evaluating systems that integrate with external tools/APIs.\nExperience with specialized agent evaluation frameworks and analyzing execution traces to identify failure modes in multi-turn interactions.\nExperience with compiled languages (e.g., Swift) and a curiosity about how ML interacts with OS-level software operations.\nA track record of developing custom metrics (e.g., \"LLM-as-a-Judge\") or publishing research on model reliability, safety, or algorithmic bias.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90733111
  • Position Id: df17e247b2304495a1e46b7746477891
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Cupertino, California

Today

Full-time

Santa Clara, California

Today

Full-time

Cupertino, California

Today

Full-time

Sunnyvale, California

Today

Full-time

Search all similar jobs