We are Foundation Model Inference Team, within AI, Search & Knowledge Platform Technologies organization. Our team is responsible to build Inference stack to power Apple Intelligence. It builds frameworks, services and tools that power the largest Apple foundation models on servers. Our Infrastructure powers a wide gamut of services at Apple including Apple Search, Apple Music, AppleTV, AppStore, iMessages, Photos & Camera, Spotlight, Safari, Siri and upcoming ever exciting Apple products serving millions of queries every day with incredible low latencies, drawing every ounce of compute from our hardware. As part of this group, you will get a chance to bring Intelligence to billions of users across the world. You will have an opportunity to make difference in life of people by empowering them with AI. You will have a chance to work on optimizing billions of parameter langauge and vision and speech models using state of the art technologies and make it run at scale of Apple.
Work along side Foundation Model Research team to optimize inference for cutting edge model architectures.\nWork closely with product teams to build Production grade solutions to launch models serving millions of customers in real time.\nBuild tools to understand bottlenecks in Inference for different hardwares and use cases.\nMentor and guide engineers in the organization.
5+ years of experience leading and driving complex, ambiguous projects.\nExperience with LLM inference stack\nFamiliarity with GPU programming concepts using CUDA.\nFamiliarity with one of the popular ML Frameworks like Pytorch, Tensorflow.\nHave experience with high throughput services particularly at supercomputing scale.\nProficient with running applications on Cloud (AWS / Azure or equivalent) using Kubernetes, Docker etc. \nFamiliar with one of the popular ML Frameworks like Pytorch, Tensorflow.\nBS in Computer Science, Artificial Intelligence, Machine Learning, Information Retrieval, Data Science or related field
Proficient in building and maintaining systems written in modern languages (eg: Golang, Python)\nFamiliar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models. \nFamiliarity with Nvidia TensorRT-LLM, vLLM, DeepSpeed, Nvidia Triton Server etc. \nExperience writing custom CUDA kernels using CUDA or OpenAI Triton. \nMS in Computer Science, Artificial Intelligence, Machine Learning, Information Retrieval, Data Science or related field.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 90733111
- Position Id: de8a4992a2492a7602d31f28063c7a42
- Posted 30+ days ago