Staff Site Reliability Engineer

Manhattan Beach, CA, US • Posted 5 days ago • Updated 8 hours ago
Full Time
On-site
$170000 - $230000/yr
Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

  • Physical Security
  • Real-time
  • Innovation
  • Recruiting
  • Conflict Resolution
  • Problem Solving
  • High Availability
  • Cloud Computing
  • DevOps
  • Amazon Web Services
  • Docker
  • Kubernetes
  • Terraform
  • Kotlin
  • Rust
  • Python
  • TypeScript
  • Grafana
  • Relational Databases
  • SQL Tuning
  • Reliability Engineering
  • Capacity Management
  • Root Cause Analysis
  • Build Automation
  • Management
  • Continuous Integration
  • Continuous Delivery
  • Scalability
  • Incident Management
  • Disaster Recovery
  • Mentorship
  • IT Management
  • SAP BASIS

Summary

A fast-growing, venture-backed technology company is transforming how organizations approach physical security through a modern, software-driven platform. By combining real-time data, intelligent automation, and seamless system integrations, they enable security teams to shift from reactive incident response to proactive threat prevention. The team is highly collaborative, mission-driven, and focused on solving complex, real-world problems in an industry undergoing rapid innovation.

They are hiring a Staff Site Reliability Engineer to join their platform engineering group. In this role, you will own the reliability and performance of mission-critical systems that connect cloud-based services with distributed edge environments. You'll lead efforts around observability, incident response, and infrastructure scalability while mentoring engineers and helping shape SRE best practices. This position involves deep technical problem-solving across the stack, building automation to reduce operational overhead, and ensuring high availability across a complex, cloud-native architecture.

Required Skills & Experience
  • 6+ years of hands-on experience in SRE, DevOps, or operations roles
  • Expert-level knowledge of AWS and container tech (Docker, Kubernetes)
  • Strong skills in infrastructure as code (Terraform, CloudFormation, etc.)
  • Proficiency in Kotlin, Rust, Python, or TypeScript
  • Experience with monitoring tools (Prometheus, Grafana, DataDog, etc.)
  • Hands-on with relational databases and SQL performance optimization

What You Will Be Doing
  • Own system reliability, including monitoring, alerting, and capacity planning
  • Troubleshoot and resolve complex production issues across infrastructure and application layers
  • Participate in an on-call rotation supporting critical systems
  • Conduct root cause analyses and implement long-term fixes
  • Build automation and internal tooling to improve system performance and reduce toil
  • Manage and optimize CI/CD pipelines and observability frameworks
  • Improve scalability, resilience, and maintainability of distributed systems
  • Help define incident response processes and disaster recovery strategies
  • Provide mentorship and technical leadership across the engineering team

You will receive the following benefits:
  • Medical, dental, and vision coverage
  • 401(k) Match
  • Generous PTO
  • Employee Discounts

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10105282
  • Position Id: 870042
  • Posted 5 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

No location provided

Today

Full-time

Hybrid in Coppell, Texas

Today

Full-time

Remote

Today

Easy Apply

Full-time

$160000 - $180000

Remote

Today

Easy Apply

Full-time, Third Party

Depends on Experience

Search all similar jobs