Apply Now

Sr Software Engineer - AI, Search & Knowledge Platform - Cloud Infrastructure

Cupertino, CA, US • Posted 4 hours ago • Updated 4 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

Value Engineering
Management
Artificial Intelligence
Cloud Computing
Collaboration
Golang
Python
Terraform
Workflow
Performance Tuning
GPU
Optimization
Debugging
IaaS
Open Source
Kubernetes
API
Microsoft Certified Professional
Servers
Grafana
Machine Learning (ML)
Training
Systems Modeling
Communication
Leadership
Technical Writing
Computer Science
Computer Engineering

Summary

Are you an open-source contributor passionate about building the next generation of cloud-native ML infrastructure? We're looking for a hands-on technical leader with deep expertise in Kubernetes, Crossplane, Golang/Python, and agentic workflows to design and scale the platforms that power Apple's Search and ML infrastructure ecosystems. If you've contributed to CNCF projects such as Kubernetes, Crossplane, or ArgoCD-and you're driven to build intelligent, automated infrastructure for ML training and inference at massive scale-this role is for you. You'll architect systems that are declarative, self-managing, and highly performant, enabling seamless ML experiences for billions of users.

The AI, Search & Knowledge Platform Cloud Infrastructure Team within Apple's Services organization designs, builds, and scales the foundational systems that power Search, and next-generation machine learning workloads. We are reimagining how infrastructure is managed through agentic, event-driven workflows, Crossplane compositions, and self-healing control planes. You'll develop Model Context Protocol (MCP)-based infrastructure servers that integrate with ML and data workflows, delivering highly automated and observable infrastructure across hybrid and multi-cloud environments.\nYou will collaborate across ML engineering, SRE, and platform teams to deliver infrastructure that adapts intelligently to application needs, optimizes for cost and performance, and accelerates the development of ML training and inference pipelines.

BS/MS in Computer Science or equivalent practical experience.\n5+ years of experience in leading distributed systems or cloud infrastructure engineering.\nStrong programming experience in Golang and Python, including building controllers, operators, or automation systems.\nDeep understanding of Kubernetes internals, controller-runtime, and Crossplane composition frameworks.\nExperience with ArgoCD, Helm, and IaC (Terraform or Crossplane).\nHands-on experience with GitOps and reconciliation-driven workflows.\nProven ability to design and operate infrastructure for ML training and inference, including performance tuning and GPU optimization.\nExperience leading technical teams and driving architectural decisions.\nStrong grounding in cost efficiency, performance profiling, and system-level debugging.

9+ years in cloud infrastructure, SRE, or distributed systems roles.\nContributions to CNCF open-source projects (Kubernetes, Crossplane, ArgoCD, Envoy, Prometheus, etc.).\nDeep expertise in Kubernetes API machinery, CRDs, and control plane development.\nExperience with Model Context Protocol (MCP) or contextual infrastructure servers.\nFamiliarity with AIOps or agentic/LLM-driven automation in production environments.\nStrong understanding of observability and distributed tracing (OpenTelemetry, Prometheus, Grafana).\nExperience building ML infrastructure platforms (training clusters, inference systems, model registries).\nExcellent communication, cross-functional leadership, and technical writing skills.\nB.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience is preferred

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90733111
Position Id: 2dbc41e8711c5a089d11d3819b50371a
Posted 4 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Mountain View, California

•

Today

Who We Are Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver , to support a wide range of applications, from robotaxis and commercial fleets to personally owned vehicles. With technology proven over years of self-driving deployments, Nuro gives the automakers and mobility platforms

Full-time

USD 193,930.00 - 291,150.00 per year

Senior Software Engineer, Search Infrastructure - Moveworks

Mountain View, California

•

Today

Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500 . Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But thi

Full-time

Software Engineer, ML Infrastructure

Mountain View, California

•

Today

Full-time

USD 160,360.00 - 240,540.00 per year

Associate Software Engineer, Search Infrastructure - Moveworks

Mountain View, California

•

Today

Full-time

Search all similar jobs

Sr Software Engineer - AI, Search & Knowledge Platform - Cloud Infrastructure

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs