Apply Now

Senior Site Reliability Engineer

• Posted 30+ days ago • Updated 1 hour ago

Full Time

Fitment

Dice Job Match Score™

✨ Finding the perfect fit...

Job Details

Skills

Scalability
FOCUS
Continuous Improvement
Service Level
Pivotal
Expect
Collaboration
Reliability Engineering
Computer Science
Cloud Computing
Budget
Capacity Management
Incident Management
Python
Scripting
Continuous Integration
Continuous Delivery
Configuration Management
Orchestration
Docker
Kubernetes
Amazon Web Services
Google Cloud Platform
Google Cloud
Microsoft Azure
DevOps
Management
Artificial Intelligence
Machine Learning (ML)

Scalability
FOCUS
Continuous Improvement
Service Level
Pivotal
Expect
Collaboration
Reliability Engineering
Computer Science
Cloud Computing
Budget
Capacity Management
Incident Management
Python
Scripting
Continuous Integration
Continuous Delivery
Configuration Management
Orchestration
Docker
Kubernetes
Amazon Web Services
Google Cloud Platform
Google Cloud
Microsoft Azure
DevOps
Management
Artificial Intelligence
Machine Learning (ML)

Summary

We are currently seeking an experienced Senior Site Reliability Engineer (SRE) to join our team. In this critical role, you will collaborate closely with software developers and operations teams to ensure the high reliability, scalability, and efficiency of our systems. You will also strongly focus on meeting and exceeding customer expectations. Your expertise will be crucial in deploying, maintaining, and automating our infrastructure and application environments to ensure seamless user experiences. Your proactive involvement will be key to enhancing system reliability, optimizing resource utilization, and ensuring continuous improvement in our operational practices. Your responsibilities will include defining and tracking Service Level Objectives (SLOs), managing error budgets, and reducing toil through automation. You will play a pivotal role in driving the success of technology initiatives, maximizing their impact across the organization, and ensuring that solutions consistently meet the high standards our customers expect. Responsibilities Collaborate with development, security, quality, and operations teams to implement SRE practices and ensure system reliability Define and support the required level of reliability, availability, and performance for services and applications Troubleshoot, mitigate, and support fixing of the infrastructure and application issues in a timely manner Implement a monitoring system for the infrastructure and application reliability Requirements Bachelor's degree in Computer Science, Engineering, or a related field Proven experience in any cloud (AWS/Google Cloud Platform/Azure) Experience with implementing SRE practices such as SLO/SLI, Error budgets, Postmortems, Reducing Toil, capacity planning, and Incident Management Knowledge of Python or other scripting/programming language Strong background in monitoring tools Proficiency in CI/CD tools, infrastructure as code, and configuration management Solid knowledge of container orchestration technologies (Kubernetes, Docker) Nice to have Expertise in deployment and management of LLMs, including technologies like RAG Certification in Kubernetes, AWS/Google Cloud Platform/Azure, or similar technologies Proven experience in DevOps Knowledge of managing and optimizing AI/ML models in production environments, including basic deployment, monitoring, and maintenance

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10330481
Position Id: 464f4cc9f8fd38e518989701d119902d
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Senior Site Reliability Engineer

Stamford, Connecticut

•

Today

The Senior Site Reliability Engineer is responsible for improving the reliability, availability, scalability, and operational excellence of our critical infrastructure platforms and services. This role partners closely with Engineering, Security, and Infrastructure teams to design resilient cloud-native architectures, implement Infrastructure as Code (IaC) and CI/CD standards, and drive measurable reliability outcomes. The Senior Site Reliability Engineer will also lead efforts to define and val

Full-time

Senior Site Reliability Engineer

Austin, Texas

•

Today

Company Cox Automotive - USA Job Family Group Engineering / Product Development Job Profile Sr Software Engineer Management Level Individual Contributor Flexible Work Option Hybrid - Ability to work remotely part of the week Travel % No Work Shift Day Compensation Compensation includes a base salary in the range of $111,600.00 - $186,000.00. The base salary may vary within the anticipated base pay range based on factors such as the ultimate location of the position and the selected c

Full-time

USD 111,600.00 - 186,000.00 per year

Site Reliability Engineer

Denver, Colorado

•

Today

Say hello to opportunities. If you're looking to be part of what's next in communication, you're in the right place. At RingCentral, we believe the best customer experiences happen when humans and AI work together. Our agentic voice AI portfolio-AIR, AVA, and ACE-brings together automation, assistance, and insights across the entire conversation lifecycle. The result? More seamless, intelligent experiences for businesses everywhere. With $2.5B+ in ARR and $250M invested in R&D annually, we're

Full-time

USD 94,850.00 - 135,500.00 per year

Senior Site Reliability Engineer

No location provided

•

Today

Job Description As a Senior AI Site Reliability Engineer, you will play a pivotal role in building and operating the next-generation, AI-first Electronic Health Record platform. In this role, you will design, build, and operate highly reliable, scalable infrastructure and data pipelines that power mission-critical analytics globally. You will also contribute to the next evolution of cloud operations by advancing automation, observability, and AI-assisted reliability practices. This includes ex

Full-time

USD 81,100.00 - 187,000.00 per year

Search all similar jobs