Apple Cloud Networking team builds and operates large-scale, software-defined networking platforms that enable secure, resilient, and highly available multi-cloud connectivity with a global footprint. Our infrastructure powers critical Apple services, including iCloud, iTunes, Siri, and Maps.\\n \\nWe are seeking an experienced and visionary Reliability Engineering Manager to lead and grow a team of engineers focused on ensuring the availability, performance, scalability, and resiliency of Apple's global network services. In this role, you will work closely with software engineering, infrastructure, and operations teams across Apple to deliver reliable, fault-tolerant systems that operate at massive scale.
As a key leader within the Cloud Networking organization, you will define and drive the reliability and resiliency strategy for Apple's network platform services. You will be responsible for building, scaling, and mentoring a high-performing Production Engineering team that champions SRE and SWE best practices, release engineering, and data-driven decision-making.\n\nYou will establish strong cross-functional partnerships to ensure reliability and resiliency are embedded throughout the system lifecycle-from design and development to deployment and operations. Your leadership will help ensure Apple's network services meet demanding availability, latency, resilience, and security requirements while continuously improving operational maturity.\n\nWe are looking for a leader who is deeply passionate about operating mission-critical, globally distributed systems, preventing outages, learning from failures, and driving long-term reliability improvements.
10+ years of experience in software engineering, systems engineering, or infrastructure engineering.\n6+ years of experience in a technical leadership role with people management responsibilities.\nStrong background in designing, operating, and supporting highly available, fault-tolerant distributed systems at scale.\nHands-on experience with reliability engineering, SRE, or large-scale production operations.\nSolid understanding of network infrastructure and software-defined networking (SDN).\nAbility to lead cross-functional collaboration and influence technical decisions across teams.
Experience in defining and operating SLO-based reliability and resiliency programs.\nStrong knowledge of observability systems (metrics, logging, tracing) and qualification engineering.\nExperience with microservices architectures, RESTful APIs, and cloud-native platforms.\nIn-depth understanding of networking protocols, routing mechanisms, and traffic management.\nBroad knowledge of networking solutions across the OSI layers 3 through 7.\nExcellent written and verbal communication skills with the ability to clearly articulate risk, reliability trade-offs, and operational priorities.\nProven ability to manage competing priorities, drive initiatives to completion, and deliver results in fast-paced environments.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 90733111
- Position Id: 51a1e96acce262f1c2f0eb49c7ef3b4f
- Posted 4 hours ago