Senior Site Reliability Engineer

Washington, WA, US • Posted 30+ days ago • Updated 9 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

  • Art
  • Open Source
  • Apache Cassandra
  • Apache ZooKeeper
  • Apache Kafka
  • Redis
  • Fleet Management
  • Software Engineering
  • FOCUS
  • Computer Science
  • Computer Engineering
  • Kubernetes
  • Internet
  • Dragon NaturallySpeaking
  • DNS
  • DHCP
  • LDAP
  • Server Virtualization
  • Operating Systems
  • Budget
  • Reliability Engineering
  • Process Improvement
  • Computer Hardware
  • Bootstrap
  • PXE
  • BIOS
  • Total Productive Maintenance
  • TPM
  • Provisioning
  • OpenStack
  • xCAT
  • Storage
  • Caching
  • Configuration Management
  • Orchestration
  • Puppet
  • Progress Chef
  • Ansible
  • Cloud Computing
  • Amazon Web Services
  • Amazon S3
  • Amazon EC2
  • Amazon CloudFront

Summary

Apple Services Engineering team is one of the most exciting examples of Apple's long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineer, to help support and scale cloud services for millions of Apple users.

We are building and supporting new and existing critical infrastructural systems and frameworks which provide and support services like structured and unstructured storage, caching, queueing, searching, and much more at hyperscale. These form the platform upon which many iCloud and other backend systems at Apple are built. The team is responsible for the next generation platform that will power Apple's infrastructural services. These services operate at extremely large scale and store exabytes of data. The platform will support a variety of services based on open-source software, such as Kubernetes, Cassandra, Zookeeper, Kafka, Redis, etc, alongside internally developed services.

Description

The Apple Services Engineering Cloud Services SRE organization is looking for a strong, enthusiastic developer to join as a member of this group. This person will have a tremendous amount of individual responsibility and influence over the direction the core platform of many critical Apple internet services takes for years to come. You are someone with ideas and real passion for software delivered as a service to improve reuse, efficiency, and simplicity. This engineer's work will affect hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features.

We are domain experts in fleet management, systems, and software engineering. We build automations, instrument reliability tools, and respond to alerts and incidents which may pose a risk to the reliability of the platform. Team's focus is on infrastructure capabilities and processes, improving the reliability and efficiency of the systems, at scale.We are looking for a strong, enthusiastic developer to join as a member of this group. This person will have a tremendous amount of individual responsibility and influence over the direction the core platform of many critical Apple internet services takes for years to come. You are someone with ideas and real passion for software delivered as a service to improve reuse, efficiency, and simplicity. This engineer's work will affect hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features.

Minimum Qualifications

Bachelors or Masters in Computer Science, Computer Engineering, or equivalent experience.

5+ years of experience developing platform services

Experience with large scale server provisioning and maintenance (OpenStack Ironic, Metal3, MAAS, xCat, Netbox, Tinkerbell)

Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs

Understanding of base internet infrastructure services including DNS, DHCP, LDAP, server virtualization, server monitoring in critical, large scale distributed systems experience, combining Hardware, Operating Systems and Software

Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.

Preferred Qualifications

Hardware bootstrap and associated security (PXE, BIOS, TPM, secure boot, trusted computing)

Experience with hyperscale server provisioning and maintenance (OpenStack Ironic, Metal3, MAAS, xCat, Netbox, Tinkerbell)

Structured or unstructured storage and caching

Automating operations processes via services and tools

Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others

Cloud Services (AWS S3/EC2/CloudFront or equivalent)
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90733111
  • Position Id: cc48c6ead916979fe7ba0cca37e5ca0d
  • Posted 30+ days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Washington

Today

Full-time

Washington

Today

Full-time

USD 157,000.00 - 235,000.00 per year

Redmond, Washington

Today

Full-time

USD 165,000.00 - 230,000.00 per year

Remote

Today

Full-time

Search all similar jobs