Imagine what you could do here. At Apple, great new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish! \\n\\nAre you passionate about music, movies, and the world of Artificial Intelligence and Machine Learning? So are we! Join our Human-Centered AI team for Apple Products. In this role, you'll represent the user perspective on new features, review and analyze data, and evaluate AI models powering everything from search and recommendations to other innovative features. Collaborate with Data Scientists, Researchers, and Engineers to drive improvements across our platforms.
We are looking for an Evaluation & Insights Engineer for the Human-Centered AI team to help evaluate and improve AI systems by combining data science, model behavior analysis, and qualitative insights. In this role, you will analyze AI outputs, develop evaluation frameworks, design qualitative, and translate findings into actionable improvements for product and engineering teams. This role blends deep technical expertise with strong analytical judgment to assess, interpret, and improve the behavior of advanced AI models. You will work cross-functionally with the Engineering and Project Managers, Product, and Research teams to ensure that AI experience is reliable, safe, and aligned with human expectations.
Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, Cognitive Science, or a related technical field, with 5+ years of relevant industry experience in ML Engineering or Applied Research.\nAdvanced proficiency in Python and modern deep learning ecosystems (PyTorch, JAX, Hugging Face).\nProven experience building scalable ML inference pipelines, model-evaluation workflows, and structured rating frameworks for large-scale AI systems.\nStrong ability to interpret unstructured model outputs (text, transcripts, embedding spaces) and synthesize qualitative findings into actionable engineering guidance and training objectives.\nHands-on experience developing, fine-tuning, or evaluating LLMs, multimodal models, and NLP systems.\nDeep familiarity with AI quality metrics, hallucination detection techniques (e.g., SelfCheckGPT), model alignment (RLHF/DPO), and LLM-as-a-judge frameworks (e.g., G-Eval, DeepEval).\nExperience building internal tools or automated pipelines for ML workflows using tools like MLflow, Weights & Biases, or similar platforms.\nStrong familiarity with advanced prompt engineering, RAG architectures (vector databases, semantic search), and Fine-Tuning .
Knowledge of human factors, HCI, or cognitive science methodologies as applied to AI system design.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 90733111
- Position Id: 8306207870c70847eb24ff9623f362c0
- Posted 5 hours ago