A fast-growing, venture-backed technology company is transforming how organizations approach physical security through a modern, software-driven platform. By combining real-time data, intelligent automation, and seamless system integrations, they enable security teams to shift from reactive incident response to proactive threat prevention. The team is highly collaborative, mission-driven, and focused on solving complex, real-world problems in an industry undergoing rapid innovation.
They are hiring a Staff Site Reliability Engineer to join their platform engineering group. In this role, you will own the reliability and performance of mission-critical systems that connect cloud-based services with distributed edge environments. You'll lead efforts around observability, incident response, and infrastructure scalability while mentoring engineers and helping shape SRE best practices. This position involves deep technical problem-solving across the stack, building automation to reduce operational overhead, and ensuring high availability across a complex, cloud-native architecture.
Required Skills & Experience
- 6+ years of hands-on experience in SRE, DevOps, or operations roles
- Expert-level knowledge of AWS and container tech (Docker, Kubernetes)
- Strong skills in infrastructure as code (Terraform, CloudFormation, etc.)
- Proficiency in Kotlin, Rust, Python, or TypeScript
- Experience with monitoring tools (Prometheus, Grafana, DataDog, etc.)
- Hands-on with relational databases and SQL performance optimization
What You Will Be Doing
- Own system reliability, including monitoring, alerting, and capacity planning
- Troubleshoot and resolve complex production issues across infrastructure and application layers
- Participate in an on-call rotation supporting critical systems
- Conduct root cause analyses and implement long-term fixes
- Build automation and internal tooling to improve system performance and reduce toil
- Manage and optimize CI/CD pipelines and observability frameworks
- Improve scalability, resilience, and maintainability of distributed systems
- Help define incident response processes and disaster recovery strategies
- Provide mentorship and technical leadership across the engineering team
You will receive the following benefits:
- Medical, dental, and vision coverage
- 401(k) Match
- Generous PTO
- Employee Discounts
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.