Do you love building and scaling infrastructure that delights millions of customers? At Apple, we believe reliability is a feature. We are looking for a Site Reliability Engineer to join our team in overseeing the performance and availability of our core backend services in News, Stocks, Weather, Books and Creator Studio applications.
As a SRE, you won't just be responding to alerts; you will be shaping the evolution of our observability strategy, a mentor for incident management, and a champion for automation. You will help us refine our \"Golden Signals\" and ensure our Kubernetes-based ecosystem remains world-class.
Experience: 5+ years in SRE, DevOps, or Infrastructure roles with a proven track record of managing high-traffic, internet-facing production environments.\nKubernetes Expertise: Deep experience building and operating container orchestration systems (EKS/GKE/Vanilla K8s). You should be comfortable troubleshooting from the networking layer up to the application pod.\nObservability Champion: Expert knowledge of the 4 Golden Signals (Latency, Traffic, Errors, and Saturation). Proficiency with tools like Prometheus, Grafana, and Splunk is essential.\nCloud Proficiency: Hands-on experience designing and maintaining resilient infrastructure on public cloud providers (AWS, Google Cloud Platform, or Azure).\nScripting & Automation: Strong ability to code at a scripting level (Python or Go preferred) to automate toil and build self-healing systems.\nIncident Leadership: Experience leading incident response, performing Root Cause Analysis (RCA), and implementing blameless post-mortems to improve system resilience.\nInfrastructure as Code: Proficient in Terraform, CloudFormation, or Pulumi to manage immutable infrastructure.\nBachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)
Search & Data: Specialized experience operating and tuning Solr or Elasticsearch at scale.\nNetworking: Strong understanding of TCP/IP, Load Balancing (ELB/ALB), and Service Mesh (Istio/Linkerd).\nData Systems: Experience with Kafka, Cassandra, or Postgres in a distributed environment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 90733111
- Position Id: f01c0aaa2810e651c079dafd26211957
- Posted 3 hours ago