Lead Site Reliability Engineer Jobs in San Francisco, CA

Refine Results
1 - 20 of 47 Jobs

AWS EKS Lead (SRE / AWS Elastic Kubernetes Search) | Alameda, CA | Contract

SecureKloud Technologies Inc

Alameda, California, USA

Contract, Third Party

Hi , Greetings from Securekloud We do have opening for our client, Role : AWS EKS Lead Consultant Location : Alameda, CA Duration : Long-Term Contract Job Description : We are seeking a highly experienced AWS EKS Lead Consultant to lead end-to-end cloud native platform design and DevOps automation using Kubernetes and AWS. The ideal candidate will combine technical excellence in cloud infrastructure with leadership, strategic thinking, and hands-on DevOps experience in enterprise-grade envi

Lead Site Reliability Engineer II, Production Engineering

Cisco Systems, Inc.

San Francisco, California, USA

Full-time

Who We Are Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network - even the ones they don't own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues - before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and

Lead Site Reliability Engineer

Centene Corporation

Missouri, USA

Full-time

You could be the one who changes everything for our 28 million members by using technology to improve health outcomes around the world. As a diversified, national organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead Site Reliability Engineer to join our team. The ideal candidate will be responsible for developing and creating monitor

Lead Site Reliability Engineer - Remote

UnitedHealth Group

Remote or Minnetonka, Minnesota, USA

Full-time

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health op

Lead Site Reliability Engineer, Observability - Remote

Cisco Systems, Inc.

Remote

Full-time

Application window is open until further notice. The Meraki cloud supports millions of customer devices from 10 data centers around the world. Meraki's customer base has grown by a factor of 2-3 every year, serving billions of HTTP requests per day globally. Our customers depend on our products to run their critical infrastructure of network switches, security appliances, wireless APs and security cameras. As SREs at Meraki, we are responsible for building and growing the cloud that supports t

Senior Lead Site Reliability Engineer - Remote

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Would you enjoy improving stability and safety of one of the largest global networks? \n Would you enjoy hands-on network operations work on a global scale to improve our operational efficiency? \n Join our Platform Security Engineering Team \n The Platform Security Engineering team is a group of engineers that support and secure Akamai's global network and Linode cloud systems. Our systems provide data security, server integrity, network access, and secure communications infrastructure. This is

Site Reliability Engineer. Senior Lead

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Job Title: Site Reliability Engineer. Senior Lead Work Location: 145 Broadway, Cambridge, MA 02142 \n Job Description: \n Akamai Technologies, Inc. is hiring for the following role in Cambridge, MA (multiple openings): Site Reliability Engineer. Senior Lead. Working on analytical projects related to the metadata systems which support fast and reliable configuration of the company's global network. Leading the effort in working closely with the development teams in designing/implementing performa

Azure SRE Architect

Stanley David and Associates

Remote

Full-time

Role :: SRE Architect Location :: Marlborough, MA /Remote Type :: Fulltime Job Description Technical ExpertiseDeep understanding of SRE principles, SRE model, and DevOps methodologies.Experience designing highly available, scalable, and resilient distributed systems.Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture).Skilled in cloud platforms: Azure, Google Cloud Platform.Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Re

SRE Architect

Stanley David and Associates

Remote

Full-time

1. Technical Expertise Deep understanding of SRE principles, SRE model, and DevOps methodologies. Experience designing highly available, scalable, and resilient distributed systems. Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture). Skilled in cloud platforms: Azure, Google Cloud Platform. Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Relic, Splunk, AppDynamics. 2. Framework Design & Governance Define and validate SLOs,

Senior Dev Operations Engineer SRE

Buxton Consulting

Remote

Contract

Senior Dev Operations Engineer SRE Remote (Pleasanton, CA) 12+ Months Top 3 Must Haves Experience setting up alerts / alarms / notifications in AWS cloud. CloudWatch / Dynatrace Experience with AWS solutions using AWS services including Kafka, ECS, EKS. Experience with IaC (Infrastructure as code) CDK or Terraform. Thanks and Regards, Ajeet Singh Buxton Consulting 2010 Crow Canyon Place STE 100 San Ramon, CA 94583 Direct: Email:

Senior Site Reliability Engineer

Salesforce

San Francisco, California, USA

Full-time

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce We're Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too - driving you

Senior Site Reliability Engineer, Infrastructure

Cisco Systems, Inc.

San Francisco, California, USA

Full-time

Who We Are Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network - even the ones they don't own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues - before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and

Senior SRE

LiveRamp

San Francisco, California, USA

Full-time

LiveRamp is the data collaboration platform of choice for the world's most innovative companies. A groundbreaking leader in consumer privacy, data ethics, and foundational identity, LiveRamp is setting the new standard for building a connected customer view with unmatched clarity and context while protecting precious brand and consumer trust. LiveRamp offers complete flexibility to collaborate wherever data lives to support the widest range of data collaboration use cases-within organizations, b

Principal Site Reliability Engineer, Datastores

Cisco Systems, Inc.

San Francisco, California, USA

Full-time

Who We Are Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network - even the ones they don't own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues - before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and

Senior Site Reliability Engineer, Test Platform- REMOTE

Cisco Systems, Inc.

Remote or San Francisco, California, USA

Full-time

At Cisco Meraki, we create magic through the energy and passion of our employees, who shape our dynamic community and empower us to solve problems for our customers. This magic unfolds when technology becomes intuitive, functions as intended, and when every individual is valued. By providing our employees with the autonomy to make an impact, we strive to fulfill our mission of simplifying technology so our customers can focus on what matters most to them-whether it's their students, patients, cu

Staff Site Reliability Engineer, Cell Software

Tesla Motors

Remote or Fremont, California, USA

Full-time

Tesla is re-thinking how batteries are made from the ground up. We're designing new factories, new equipment, new processes and new software to rapidly scale battery manufacturing, globally. The primary bottleneck to Tesla's future expansion (and the transition to sustainable transport and energy storage) is our ability to produce and procure batteries - that's why we're innovating in-house, with our collection of world-class engineers, to redefine the industry. Software, data and automation all

Sr. Site Reliability Engineer, Compute SRE

Roblox

San Mateo, California, USA

Full-time

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators. At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with op

Senior Staff Site Reliability Engineer - CDN

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. Our legacy of innovation is driven by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, y

Staff Site Reliability Engineer, Fleetnet

Tesla Motors

Remote or Palo Alto, California, USA

Full-time

We are a product focused global team creating the next-generation of server-side infrastructure and code to support the growing suite of Tesla products and services. We are looking for seasoned SREs with domain expertise in areas related to developing infrastructure as a service, Kubernetes, Gitops, K8s Operator development, and platform security. The Fleetnet SRE team is part of the Vehicle Software division and is embedded with our backend application, data platform and navigation development

Staff Site Reliability Engineer, AI Platform

Tesla Motors

Palo Alto, California, USA

Full-time

As a Site Reliability Engineer (SRE) for the AI Platform team, you will manage bleeding-edge bare-metal servers for Tesla's advanced generative AI platform. You will be responsible for the imaging, configuration management, observability, security, and scalability of these systems. You'll also manage the model benchmarks and their outputs. You should have a focus on automating anything required of this AI platform team and use various platforms to make it as easy as possible for the software eng