Apply Now

Sr. / Staff ML Engineer, FM Training Integration - ML Compute

Santa Clara, CA, US • Posted 2 hours ago • Updated 2 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

FM
Deep Learning
Scalability
IaaS
Software Engineering
Workflow
Evaluation
Python
JAX
Cloud Computing
Storage
Performance Tuning
Debugging
Training
Data Modeling
Computer Networking
Benchmarking
PyTorch
Machine Learning (ML)
Orchestration
Docker
Kubernetes
Computer Science

Summary

We are a group of engineers to support training foundation models at Apple! We build infrastructure to support training foundation models with general capabilities such as understanding and generation of text, images, speech, videos, and other modalities and apply these models to Apple products. We are looking for engineers who are passionate about building systems that push the frontier of deep learning in terms of scaling, efficiency, and flexibility and delight millions of users in Apple products.\\n

We are looking for a ML Engineer to join our ML Compute team to help improve the efficiency, scalability, and reliability of model training and inference workloads in the cloud. In this role, you will lead the integration of large-scale ML workloads with cloud infrastructure, working cross-functionally with ML engineers, infrastructure engineers, and researchers to optimize performance, improve system efficiency, and drive high utilization of accelerator resources.

5+ years of experience in software engineering, ML infrastructure, or related domains.\n\nHands-on experience with machine learning workflows, including training, evaluation, and inference at scale.\n\nProficiency in Python and experience with at least one major ML framework (e.g., PyTorch or JAX).\n\nExperience with cloud-based infrastructure and distributed systems (e.g., containers, orchestration, storage, and networking).\n\nBachelor's degree in Computer Science, Engineering, or a related field.\n

Experience working with accelerator-based systems (e.g., GPTPUs), including performance tuning and debugging of ML workloads.\n\nHands-on experience with distributed training or inference at scale (e.g., data, model, or pipeline parallelism).\n\nExperience optimizing large-scale ML systems, including bottleneck analysis across compute, memory, and networking.\n\nFamiliarity with profiling, tracing, and benchmarking tools for ML workloads (e.g., PyTorch Profiler, NVIDIA Nsight).\n\nExperience building or operating ML infrastructure using containerization and orchestration frameworks (e.g., Docker, Kubernetes).\n\nAdvanced degree in Computer Science, Engineering, or a related field.\n

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90733111
Position Id: ac7857d9b3ff4fab528acfd8315e9368
Posted 2 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Mountain View, California

•

Today

Who We Are Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver , to support a wide range of applications, from robotaxis and commercial fleets to personally owned vehicles. With technology proven over years of self-driving deployments, Nuro gives the automakers and mobility platforms

Full-time

USD 193,930.00 - 291,150.00 per year

Software Engineer, ML Infrastructure

Mountain View, California

•

Today

Full-time

USD 160,360.00 - 240,540.00 per year

Manager, Machine Learning Infrastructure - SIML

Cupertino, California

•

Today

Do you think Computer Vision and Machine Learning can change the world? Do you think it can transform the way millions of people collect, discover and share the most special moments of their lives? We truly believe it can. And we are looking for hardworking engineers who can contribute to building the ecosystem of tooling necessary to create these exciting technologies.\\n\\nWe are the System Intelligent and Machine Learning (SIML) group that provides foundational computer vision and machine lea

Full-time

Staff/Sr. ML Compute Efficiency Engineer

Santa Clara, California

•

Today

Scaling machine learning workloads across thousands of GPUs and TPUs creates challenges that few engineers ever encounter. In Apple's Machine Learning Platform Technologies organization, we build the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise in distributed systems, machine learning infrastructure, and high-performance computing. As a performance engineer in the ML Compute Efficiency team, you'll tackle ambiguous systems challenges, i

Full-time

Search all similar jobs

Sr. / Staff ML Engineer, FM Training Integration - ML Compute

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs