Sr. Manager, Site Reliability Engineering (Starlink)

    • SpaceX
  • Redmond, WA
  • Posted 13 days ago | Updated moments ago

Overview

On Site
USD 210,000.00 - 270,000.00 per year
Full Time

Skills

Reliability engineering
Broadband
Load balancing
Data storage
IT management
Verification and validation
Computer science
Computer engineering
Electrical engineering
Operating systems
TCP/IP
continuous integration and development
Continuous monitoring
Data modeling
Performance improvement
Life insurance
Human resources
Satellite
Internet
Design
Value engineering
Scratch
IMPACT
Management
Network
Kubernetes
Administration
Provisioning
Leadership
Automation
Cloud computing
Collaboration
FOCUS
Mentorship
Database
Storage
Software deployment
Computer hardware
Apache Velocity
Mathematics
Linux
Terraform
Ansible
Docker
Python
Computer networking
Virtualization
Hypervisor
Testing
Servers
SAP BASIS
Insurance
ITAR

Job Details

SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.

SR. MANAGER, SITE RELIABILITY ENGINEERING (STARLINK)

At SpaceX we're leveraging our experience in building rockets and spacecraft to deploy Starlink, the world's most advanced broadband internet system. Starlink is the world's largest satellite constellation and is providing fast, reliable internet to nearly 3M users worldwide. We design, build, test, and operate all parts of the system - thousands of satellites, consumer receivers that allow users to connect within minutes of unboxing, and the software that glues the system together. We've only begun to scratch the surface of Starlink's potential global impact and are looking for a best-in-class engineering manager to help maximize Starlink's utility for communities and businesses around the globe.

As a Sr. Site Reliability Manager on Starlink, you'd be managing a team of ~10+ engineers distributed across Redmond, WA, Bastrop, TX, Hawthorne, CA, and Sunnyvale, CA, who are designing, operating and scaling the ground compute infrastructure we use to run the Starlink constellation and manage a network that handles millions of daily users worldwide. Your team will be responsible for a wide variety of areas including application observability, Kubernetes cluster administration, load balancing, data storage, and bare metal provisioning. You will lead the efforts to develop automation to deploy and manage compute resources both on-premises and in the cloud, and directly collaborate with engineering across the program to create highly scalable and maintainable software products.

RESPONSIBILITES:

  • Manage a team of site reliability engineers that are responsible for execution across multiple disciplines and subsystems
  • Provide strong technical leadership with a focus on excellence and execution
  • Drive all aspects of the program, including architecture, development, and delivery
  • Hire and mentor engineers and leaders to build a strong team that can deliver results on schedule
  • Deploy and manage core infrastructure such as databases, monitoring and storage
  • Closely collaborate with software engineers to create highly scalable, operable and maintainable products
  • Invent tools and processes that enable fast, accurate, and easy-to-use development and deployment systems
  • Provide fast and comprehensive software validation, including virtualized, hardware-in-the-loop, and on-orbit test platforms
  • Identify areas for improvement and create innovative solutions that enable high developer velocity

BASIC QUALIFICATIONS:

  • Bachelor's degree in computer science, computer engineering, electrical engineering, math, or other STEM discipline
  • 6+ years of professional experience in software or site reliability engineering
  • 2+ years of software or site reliability engineering management experience overseeing a team of engineers
  • 2+ years of professional experience with Linux operating systems
  • Professional experience with on-premise hardware infrastructure
  • Experience with Terraform, Ansible, or other infrastructure tools
  • Experience with containerization technologies (i.e. Docker, Kubernetes)

PREFERRED SKILLS AND EXPERIENCE:

  • 5+ years of experience with Python and Python-based development frameworks
  • Strong networking knowledge of TCP/IP
  • Strong understanding of virtualization and hypervisor technologies
  • Knowledge of Linux boot process and systems configuration
  • Deep understanding of testing, continuous integration, build, deployment & continuous monitoring
  • Strong understanding of relevant technologies, such as:
    • Bazel or other build systems
    • Linux, Docker, Kubernetes, or similar technologies
  • Understanding of databases and data modeling
  • Experience with automatically managing dozens or hundreds of servers (eg: Terraform or Ansible)
  • Focus on performance bottlenecks and performance improvement techniques
  • Excellent communications skills with the ability to communicate with customers, peers, management etc. in both formal and informal situations

ADDITIONAL REQUIREMENTS:

  • Must be willing to work extended hours and weekends as needed

COMPENSATION AND BENEFITS:

Pay range:

Senior Manager, Site Reliability Engineering: $210,000.00 - $270,000.00/per year

Your actual level and base salary will be determined on a case-by-case basis and may vary based on the following considerations: job-related knowledge and skills, education, and experience.

Base salary is just one part of your total rewards package at SpaceX. You may also be eligible for long-term incentives, in the form of company stock, stock options, or long-term cash awards, as well as potential discretionary bonuses and the ability to purchase additional stock at a discount through an Employee Stock Purchase Plan. You will also receive access to comprehensive medical, vision, and dental coverage, access to a 401(k)-retirement plan, short & long-term disability insurance, life insurance, paid parental leave, and various other discounts and perks. You may also accrue 3 weeks of paid vacation & will be eligible for 10 or more paid holidays per year. Exempt employees are eligible for 5 days of sick leave per year.

ITAR REQUIREMENTS:

    Learn more about the ITAR here.

SpaceX is an Equal Opportunity Employer; employment with SpaceX is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.

Applicants wishing to view a copy of SpaceX's Affirmative Action Plan for veterans and individuals with disabilities, or applicants requiring reasonable accommodation to the application/interview process should notify the Human Resources Department at .