Apply Now

Staff DevOps Engineer

San Jose, CA, US • Posted 1 hour ago • Updated 1 hour ago

Full Time

On-site

Compensation information provided in the description

Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

Immigration
Expect
Operational Excellence
Audiovisual
Partnership
Service Operations
Reliability Engineering
CHAOS
Dashboard
Infrastructure Architecture
Systems Design
Scalability
Capacity Management
Modeling
Optimization
Servers
Cloud Computing
Network Engineering
Automated Testing
Bridging
English
Mandarin Chinese
Documentation
DevOps
Media
Videoconferencing
Streaming
Trading
Real-time
Communication
RTP
RTCP
Service Delivery Platform
SDP
MCU
IaaS
Amazon Web Services
Google Cloud
Google Cloud Platform
Microsoft Azure
Orchestration
Kubernetes
Terraform
Stacks Blockchain
Grafana
Computer Networking
Border Gateway Protocol
Routing
Dragon NaturallySpeaking
DNS
Load Balancing
Continuous Integration
Continuous Delivery
GitHub
Jenkins
Workflow
Python
Bash
Incident Management
Software Development
Management
Finance
Collaboration
Recruiting
UPS

Summary

Immigration sponsorship is not available for this position
What you can expect

We are hiring a Staff DevOps/Site Reliability Engineer to ensure reliability, scalability, and operational excellence for our real-time communications platform. This platform supports audio/video conferencing, recording, and live-streaming functionalities. The position requires expertise in infrastructure engineering, global team collaboration, and cross-functional partnerships.

About the Team

This team manages essential meeting service operations at Zoom. They handle global, large-scale distributed systems and advance communication technology to connect individuals across physical distances.

Responsibilities

Ensuring reliability engineering and operations by owning the SLO/SLI framework for real-time services, defining, tracking, and improving latency, availability, jitter, and packet loss. Leading incident response for critical outages across the real-time platform, coordinating across time zones and engineering disciplines. Promoting a blameless postmortem culture and ensuring action items lead to measurable reliability enhancements. Implementing chaos engineering and game day exercises to proactively identify failure modes before user impact occurs. Building and evolving observability tools - dashboards, alerting systems, and distributed tracing - tailored for real-time media infrastructure challenges. Serving as the architectural authority on deployment patterns, infrastructure design, and operational readiness for real-time services. Reviewing and contributing to system design proposals, providing feedback on scalability, fault tolerance, and operational complexity. Driving capacity planning, traffic modeling, and cost optimization strategies across globally distributed infrastructure. Evaluating and recommending infrastructure tools, platforms, and vendors - including media servers, CDN providers, cloud-native services, and edge networking. Ensuring consistent standards for CI/CD pipelines, deployment safety, and progressive rollout strategies across teams. Acting as the primary SRE partner for multiple engineering teams building real-time features, attending planning sessions, and providing operational readiness guidance. Collaborating closely with network engineering, security, product, and data teams to align on platform-wide reliability requirements. Translating infrastructure constraints and reliability trade-offs into actionable recommendations for product leaders and engineering teams. Establishing and advocating DevOps best practices - infrastructure-as-code, GitOps, automated testing, and deployment automation - across partner teams. Guiding senior engineers on SRE principles, reliability patterns, and operational discipline. Serving as a technical liaison between US-based and China/India-based engineering teams, bridging communication gaps and providing technical context. Conducting architecture reviews, incident retrospectives, and planning sessions in English and Mandarin as appropriate. Maintaining a flexible schedule to ensure meaningful overlap with teams in Beijing, Shanghai, Bangalore, and Hyderabad. Building collaborative relationships across cultural and geographic boundaries, adapting communication styles to foster trust and alignment. Ensuring engineering documentation, runbooks, and architectural decision records are accessible and understandable for global team members.

What we're looking for

10+ years in DevOps, SRE, or infrastructure engineering roles, with at least 3 years at a staff or principal level scope.
Have a proven track record owning reliability for large-scale, distributed, latency-sensitive systems in production
Have experience in supporting real-time or media-heavy platforms (video conferencing, live streaming, gaming, trading systems, or similar).
Demonstrate ability to lead cross-functional technical initiatives without direct authority, driving alignment across engineering, product, and operations.
Have conceptual and architectural understanding of real-time communication protocols: WebRTC, RTP/RTCP, TURN/STUN, SDP, and SFU/MCU topologies.
Have solid expertise in cloud infrastructure (AWS, Google Cloud Platform, or Azure) and container orchestration (Kubernetes, Helm, ArgoCD).
Demonstrate proficiency with infrastructure-as-code tooling: Terraform, Pulumi, or equivalent.
Have experience with observability stacks: Prometheus, Grafana, Datadog, Jaeger, OpenTelemetry, or equivalent.
Have an understanding of networking fundamentals: BGP, anycast routing, DNS, load balancing, and CDN architecture.
Utilize CI/CD tools such as GitHub Actions, Jenkins, and Spinnaker to streamline workflows and improve deployment processes.
Implement deployment safety practices like canary releases, feature flags, and blue/green strategies to ensure reliable software delivery.
Demonstrate proficiency in Python, Bash, or Go for automation, tooling, and incident response without requiring advanced software development expertise.
Occasional weekend work may be required
Ability to work across the globe or multiple time zones

Salary Range or On Target Earnings:

Minimum:
$124 000,00

Maximum:
$271 200,00

In addition to the base salary and/or OTE listed Zoom has a Total Direct Compensation philosophy that takes into consideration; base salary, bonus and equity value.

Note: Starting pay will be based on a number of factors and commensurate with qualifications & experience.

We also have a location based compensation structure; there may be a different range for candidates in this and other locations

At Zoom, we offer a window of at least 5 days for you to apply because we believe in giving you every opportunity. Below is the potential closing date, just in case you want to mark it on your calendar. We look forward to receiving your application!

Anticipated Position Close Date:

05/31/26

Ways of Working
Our structured hybrid approach is centered around our offices and remote work environments. The work style of each role, Hybrid, Remote, or In-Person is indicated in the job description/posting.

Benefits
As part of our award-winning workplace culture and commitment to delivering happiness, our benefits program offers a variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health; support work-life balance; and contribute to their community in meaningful ways. Click Learn for more information.

About Us
Zoomies help people stay connected so they can get more done together. We set out to build the best collaboration platform for the enterprise, and today help people communicate better with products like Zoom Contact Center, Zoom Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars.
We're problem-solvers, working at a fast pace to design solutions with our customers and users in mind. Find room to grow with opportunities to stretch your skills and advance your career in a collaborative, growth-focused environment.

Our Commitment

At Zoom, we believe great work happens when people feel supported and empowered. We're committed to fair hiring practices that ensure every candidate is evaluated based on skills, experience, and potential. If you require an accommodation during the hiring process, let us know-we're here to support you at every step.

If you need assistance navigating the interview process due to a medical disability, please submit an Accommodations Request Form and someone from our team will reach out soon. This form is solely for applicants who require an accommodation due to a qualifying medical disability. Non-accommodation-related requests, such as application follow-ups or technical issues, will not be addressed.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10516897
Position Id: 708d59b5fd7f844e7b2326df85a6fab3
Posted 1 hour ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Senior DevOps Engineer

San Jose, California

•

Today

Immigration sponsorship is not available for this position What you can expect We are hiring a Senior DevOps Engineer to drive the reliability, scalability, and operational health of our realtime communications platform. You will be responsible for the day-to-day excellence of services supporting audio/video conferencing, recording, and live-streaming. This role requires a hands-on expert who can bridge the gap between complex infrastructure engineering and seamless product delivery within a glo

Full-time

Compensation information provided in the description

Principal Kubernetes DevOps Engineer

San Jose, California

•

Today

We are seeking a Principal Kubernetes DevOps Engineer who combines deep technical expertise with broad system understanding. This engineer should be capable of diving into a wide range of services and identifying systemic issues across architecture, CI/CD flow, and containerization environments. This role requires technical leadership, analytical skill, and cross-team collaboration to drive reliability, scalability, and modernization. About the Team At Zoom, we're building the next generation o

Full-time

USD 146,700.00 - 339,300.00 per year

DevOps Engineer (FortiAppSec)

Sunnyvale, California

•

Today

Job Description We are seeking a highly skilled DevOps Engineer to join our team. In this role, you will design, implement, and maintain scalable, resilient, and secure infrastructure. You will work closely with Development, product, and QA teams to enhance development workflows, optimize CI/CD pipelines, and ensure system reliability. This is a critical position for driving operational excellence and fostering a culture of collaboration between development and operations. Key Responsibilities

Full-time

USD 110,000.00 - 150,000.00 per year

Sr Staff Site Reliability Engineer - Veza

Santa Clara, California

•

Today

Company Description Veza is the pioneer in identity security, purpose-built to answer the fundamental question enterprises face: who can and should take what action on what data. Veza's Access Graph platform maps an organization's entire identity ecosystem across users, groups, roles, policies, permissions, and resources providing deep visibility and control over human, non-human, and agentic identities across SaaS, cloud, on-prem, and custom applications. With over 30 billion access permission

Full-time

USD 165,500.00 - 289,600.00 per year

Search all similar jobs