site reliability engineer Jobs in remote or san francisco, ca

Refine Results
21 - 40 of 175 Jobs

Principal Software Engineer - Site Reliability Engineering

Roblox

San Mateo, California, USA

Full-time

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators. At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with op

Internship, Site Reliability Engineer, Applications Engineering (Fall 2025)

Tesla Motors

Fremont, California, USA

Full-time

Consider before submitting an application: This position is expected to start around September 2025 and continue through the Fall term (approximately December 2025) or into Spring 2026 if available and there is an opportunity to do so. We ask for a minimum of 12 weeks, full-time and on-site, for most internships. Our internship program is for students who are actively enrolled in an academic program. entry level candidates seeking employment after graduation and not returning to school should a

L3 Support SRE Engineer

Litmus7 Systems Consulting Inc.

San Ramon, California, USA

Full-time

Role - Sr L3 support Engineer Location - San Ramon, CA. Working from Office Should have good End to End knowledge of various Commerce subsystems which include at least Storefront, Core Commerce back end, Post Purchase processing, OMS, Store / Warehouse Management processes, Supply Chain and Logistic processes.Extensive backend development knowledge with core Java/J2EE and Microservice based event driven architecture.should be cognizant of key integrations undertaken in eCommerce and associated d

Datadog Consultant

S3 Staffing USA

San Ramon, California, USA

Contract

Datadog Consultant Location: San Ramon, CA (On-site) Only Locals Face to Face interview MUST Duration: Long Term Contract Design, implement, and manage observability solutions using Datadog (Logs, Metrics, APM, RUM, Synthetics, etc.) Develop real-time dashboards and alerts to monitor critical infrastructure and application health Collaborate with development, SRE, and DevOps teams to identify key metrics and create actionable observability strategies Optimize existing monitoring setups and ident

Sr. Site Reliability Engineer, Compute SRE

Roblox

San Mateo, California, USA

Full-time

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators. At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with op

Staff Site Reliability Engineer, Cell Software

Tesla Motors

Remote or Fremont, California, USA

Full-time

Tesla is re-thinking how batteries are made from the ground up. We're designing new factories, new equipment, new processes and new software to rapidly scale battery manufacturing, globally. The primary bottleneck to Tesla's future expansion (and the transition to sustainable transport and energy storage) is our ability to produce and procure batteries - that's why we're innovating in-house, with our collection of world-class engineers, to redefine the industry. Software, data and automation all

Sr. Linux Site Reliability Engineer, IT Manufacturing Site Reliability Engineering

Tesla Motors

Fremont, California, USA

Full-time

We are seeking an enthusiastic SRE to join our dynamic IT Manufacturing Site Reliability Engineering (ITMFG-SRE) team at Tesla. Our team is responsible for building and managing an ecosystem of applications and platforms essential to manufacturing. As a Linux SRE, this role requires experience with hardware, software, networking, and automation to implement scalable solutions for manufacturing sites globally. You'll play a key role in maintaining, optimizing and scaling our infrastructure to sup

Senior Software Engineer - Site Reliability Engineering

Roblox

San Mateo, California, USA

Full-time

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators. At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with op

Remote Telecom Cloud Infrastructure SRE

Infinite Computer Solutions (ICS)

Remote

Full-time

Note : GCEAD, L2EAD, consultant can apply for this position who can work on W2. Job Descriptions : The team youll be part of Red Hat is looking forSite Reliability Engineers(SREs) to be part of the Infrastructure Customer Engineering team (R&D production development) as part of RH Telco Cloud Organization. As aCloud Infrastructure SRE, you will join a special Engineering task force dedicated to preventing and solving the most critical and strategic customer issues encountered in the field.

Senior SRE

LiveRamp

San Francisco, California, USA

Full-time

LiveRamp is the data collaboration platform of choice for the world's most innovative companies. A groundbreaking leader in consumer privacy, data ethics, and foundational identity, LiveRamp is setting the new standard for building a connected customer view with unmatched clarity and context while protecting precious brand and consumer trust. LiveRamp offers complete flexibility to collaborate wherever data lives to support the widest range of data collaboration use cases-within organizations, b

Staff Site Reliability Engineer, AI Platform

Tesla Motors

Palo Alto, California, USA

Full-time

As a Site Reliability Engineer (SRE) for the AI Platform team, you will manage bleeding-edge bare-metal servers for Tesla's advanced generative AI platform. You will be responsible for the imaging, configuration management, observability, security, and scalability of these systems. You'll also manage the model benchmarks and their outputs. You should have a focus on automating anything required of this AI platform team and use various platforms to make it as easy as possible for the software eng

Senior Staff Site Reliability Engineer - CDN

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. Our legacy of innovation is driven by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, y

Sr. Site Reliability Engineer, Integration Tools

Tesla Motors

Palo Alto, California, USA

Full-time

The Integration Platforms team develops and operates critical technology to support our ever-expanding customer fleet from prototype to production. As an SRE on this team, you will ensure the reliability, scalability, and performance of our on-vehicle, desktop-based, and web-based systems, collaborating closely with software engineers to design, build, and operate these systems across multiple regions. Join us and you will work alongside world-class software and data engineers on some of the new

Staff Site Reliability Engineer, Fleetnet

Tesla Motors

Remote or Palo Alto, California, USA

Full-time

We are a product focused global team creating the next-generation of server-side infrastructure and code to support the growing suite of Tesla products and services. We are looking for seasoned SREs with domain expertise in areas related to developing infrastructure as a service, Kubernetes, Gitops, K8s Operator development, and platform security. The Fleetnet SRE team is part of the Vehicle Software division and is embedded with our backend application, data platform and navigation development

CDN Site Reliability Engineer L4/L5 - Live Streaming, Open Connect CDN

Netflix, Inc.

Remote or Los Gatos, California, USA

Full-time

Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time. In this role, you will support the CDN delivery and day-to-day live-streaming operations for Netflix. As a Live CDN SRE, you will be participating in the preparation, valida

Sr. Site Reliability Engineer, Dojo

Tesla Motors

Palo Alto, California, USA

Full-time

We are seeking an experienced Site Reliability Engineer (SRE) to join our team responsible for ensuring the reliability, performance of our Dojo cluster infrastructure. The successful candidate will be responsible for providing exceptional customer response and support, managing third-party systems, and collaborating with various teams to ensure seamless operations. If you have a passion for troubleshooting, automation, and collaboration, we encourage you to apply. Responsibilities Respond to c

Site Reliability Engineer L4/L5 - Live Cloud Platform SRE

Netflix, Inc.

Remote or Los Gatos, California, USA

Full-time

Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time. Netflix has been changing how people watch shows and movies, enabling on-demand access to thousands of movies and TV shows. Recently, Netflix has expanded its entertainment

Sr. Cloud Engineer - SME

ARK Solutions Inc

Remote

Contract

Role: Sr. Level Cloud Engineer SME (Technical + Operations) - (DevOps/SRE) Location: 100% Remote Duration: 12 month(s)+ Experience and Skills Required: The candidate will be immersed in the elements of cloud development, design, integration, operation, and support of infrastructure services. Works with operational, application and engineering experts to develop, implement and maintain services, monitoring, reporting, P-V migrations and new automation functionalities for comprehensive solutions t

Sr. Site Reliability Engineer, Energy Software

Tesla Motors

Palo Alto, California, USA

Full-time

Tesla is looking for a Site Reliability Engineer to build, enhance, and scale the infrastructure that underpins our Energy IoT applications. These applications provide real-time monitoring, optimization, and control for Tesla's industry-leading energy products, including Powerwall, Megapack, Solar Roof, Supercharger, Wall Connector, Autobidder, and Virtual Power Plants. We are a high-impact team that values curiosity, learning, mentorship, open discourse, and making disciplined decisions by weig

Site Reliability Engineer (SRE)

Ignitec Inc.

Remote

Contract

Site Reliability Engineer This is a Full-Time Remote Position supporting a Large Healthcare Organization. Full benefits, PTO, and paid holidays are inclusive of this position. Number of Available Positions: 3 Requirements: ship Work Status Authorization; 4+ years of professional experience with SRE/DevOps; Expert knowledge of cloud services (Azure); Experience with Docker & Kubernetes; Experience with automation configuration management (Chef, Puppet, or Ansible); Knowledgeable about Encryption-