Machine Learning - Data Scientist

Sunnyvale, CA, US • Posted 4 days ago • Updated 1 day ago
Full Time
On-site
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Computer Vision
  • Video Engineering
  • Data Analysis
  • Analytical Skill
  • Data Quality
  • Collaboration
  • Language Models
  • Reasoning
  • Scripting Language
  • Research
  • User Experience
  • Failure Analysis
  • Management
  • Deep Learning
  • Video
  • Python
  • NumPy
  • Pandas
  • scikit-learn
  • PyTorch
  • TensorFlow
  • Testing
  • Documentation
  • Open Source
  • Prompt Engineering
  • Evaluation
  • Machine Learning (ML)

Summary

Do you have a passion for computer vision and solving deep learning problems? The Video Engineering Data Analytics and Quality group is seeking an expert in evaluating machine learning and deep learning models, including foundation models and multimodal systems. \\n\\nThis role will play a critical part in crafting robust evaluation frameworks, using both traditional statistical methods and modern techniques like LLM-as-a-Judge! The ideal candidate combines strong analytical thinking, expertise in Python, and advanced knowledge of statistical methodologies and data quality standards. \\n\\nThis role involves collaboration with teams at Apple passionate about developing foundation models, including ML engineers, data scientists, and ML Infrastructure engineers to deliver amazing user experiences!

Develop robust methodologies to assess the performance of foundation models (e.g., LLMs, vision-language models, etc.) across diverse tasks. \nLeverage LLMs as judges to perform subjective and open-ended model evaluations (e.g., for summarization, reasoning, or multimodal generation tasks). \nBuild, curate, and lead evaluation datasets and benchmarks. \nAdvanced proficiency in at least one scripting language, preferably Python. \nCollaborate with research, engineering, and product teams to define evaluation goals aligned with user experience and product quality. \nConduct failure analysis and uncover edge cases to improve model robustness. \nContribute to our tools and infrastructure to automate and scale evaluation processes.

BS and a minimum of 3 years relevant industry experience\nStrong experience in evaluating supervised, unsupervised, and deep learning models.\nHands-on experience evaluating LLMs and using them as scoring/judging mechanisms.\nFamiliarity with multimodal models (e.g., image + text, video + audio) and related evaluation challenges.\nProficiency in Python and libraries such as NumPy, pandas, scikit-learn, PyTorch, or TensorFlow.\nSolid understanding of statistical testing, sampling, confidence intervals, and metrics (e.g., precision/recall, BLEU, ROUGE, FID, etc.).\nStrong documentation skills, including the ability to write technical reports and present to non-technical audiences.

Experience working with open-source evaluation tools like OpenEval, ELO-based ranking, or LLM-as-a-Judge frameworks.\nFamiliarity with prompt engineering, few-shot or zero-shot evaluation techniques.\nExperience evaluating generative models (e.g., text generation, image generation).\nPrior contributions to ML benchmarks or public evaluations.\nStrong interpersonal skills.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90733111
  • Position Id: a56f3a1e5f63849153658973db41e643
  • Posted 4 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sunnyvale, California

Yesterday

Full-time

Cupertino, California

Yesterday

Full-time

Cupertino, California

Yesterday

Full-time

Cupertino, California

Yesterday

Full-time

Search all similar jobs