Site Reliability Engineer (Edge Services), Infrastructure Services

Austin, TX, US • Posted 4 hours ago • Updated 4 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • Innovation
  • Pivotal
  • Bridging
  • Customer Experience
  • Conflict Resolution
  • Problem Solving
  • Linux
  • Computer Networking
  • HTTP
  • HTTPS
  • TLS
  • Debugging
  • Workflow
  • Python
  • Grafana
  • FOCUS
  • Data Structure
  • Algorithms
  • Budget
  • Release Management
  • Incident Management
  • Computer Science
  • Management
  • Cloud Computing
  • Amazon Web Services
  • Google Cloud
  • Google Cloud Platform
  • Microsoft Azure
  • Terraform
  • Ansible
  • Kubernetes
  • Service Design

Summary

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something - you'll add something. At Apple, we create products and services that have changed entire industries. Our diverse team of people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple and help us make the world a better place. Edge Services is responsible for the foundational services that every Apple team and billions of customer devices rely on. Our services need to be highly available, scale for global reach, and just work. If you love designing, engineering, and running systems that will help our customers, then this is the perfect place for you!

The Edge Services team is on the hunt for a software engineer focused to champion the evolution of our production ecosystems. In this role, you will help drive the vision for our visibility, moving beyond simple uptime metrics to build a sophisticated, data-driven reliability framework. You will play a pivotal role in ensuring our services are resilient, scalable, and observable, bridging the gap between complex distributed systems and seamless user experiences. We're seeking an engineer who is passionate about building system software, solving seemingly insurmountable problems, and deeply committed to delivering an outstanding customer experience. You'll go beyond the industry standard, demonstrating creativity in problem-solving, the ability to think dynamically, and the agility to adapt quickly to new technical areas.

Systems Expertise: Strong understanding of Linux internals and deep networking expertise, including HTTP/2, HTTP/3 (QUIC), and HTTPS/TLS. You should be comfortable debugging protocol-level issues and optimizing traffic flow.\nAutomation Mindset: Proven ability to automate repetitive tasks and complex workflows using Python or Go\nObservability Logic: Experience configuring and managing modern monitoring suites (e.g., Prometheus, Grafana, ClickHouse) with a focus on creating actionable, high-signal quality alerting.\nCS Fundamentals: Solid grasp of Data Structures and Algorithms (DSA) to write efficient, performant code and troubleshoot complex system bottlenecks.\nSRE Principles: Practical knowledge of SLIs, SLOs, Error Budgets, Release Management and Incident Management to drive engineering priorities.\nBS in Computer Science or a related field or equivalent job-related experience

Infrastructure as Code: Experience managing cloud environments (AWS, Google Cloud Platform, or Azure) using Terraform, Ansible, or Pulumi.\nOrchestration: Hands-on experience scaling and securing containerized workloads via Kubernetes.\nIncident Response: A track record of leading \"blameless post-mortems\" and using those insights to harden the system against future failures.\nArchitectural Influence: Ability to consult with product teams on service design to improve long-term maintainability.\nReliability Engineering: A proactive engineering mindset focused on shifting from \"fixing things when they break\" to \"designing things so they don't break\" (or so they fail gracefully).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90733111
  • Position Id: 600ca387630ae7c471a6368bc1f26c45
  • Posted 4 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Austin, Texas

Today

Full-time

Austin, Texas

Today

Full-time

Austin, Texas

Today

Full-time

Austin, Texas

Today

Full-time

Search all similar jobs