reliability engineer Jobs in san francisco, ca

Refine Results
41 - 60 of 99 Jobs

Technical Lead, Site Reliability Engineer, Fleetnet

Tesla Motors

Remote or Palo Alto, California, USA

Full-time

We are a small team of experts focused on creating the next-generation server-side infrastructure for Tesla. We're the invisible link connecting every Tesla product, whether it's vehicles, robots, robotaxis, chargers or even mobile apps to bring customers the best user experience possible. We're looking for strong, hands on, technical leader with domain expertise in one or more of: containers, public clouds, or private clouds. Today, over 10 million Tesla users rely on our services to safely and

Principal Site Reliability Engineer, Datastores

Cisco Systems, Inc.

San Francisco, California, USA

Full-time

Who We Are Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network - even the ones they don't own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues - before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and

Site Reliability Engineer

LiveRamp

San Francisco, California, USA

Full-time

LiveRamp is the data collaboration platform of choice for the world's most innovative companies. A groundbreaking leader in consumer privacy, data ethics, and foundational identity, LiveRamp is setting the new standard for building a connected customer view with unmatched clarity and context while protecting precious brand and consumer trust. LiveRamp offers complete flexibility to collaborate wherever data lives to support the widest range of data collaboration use cases-within organizations, b

Senior Site Reliability Engineer, ML Platforms

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

Are you passionate about building and maintaining large-scale production systems that support advanced data science and machine learning applications? Do you want to join a team at the heart of NVIDIA's data-driven decision-making culture? If so, we have a great opportunity for you! NVIDIA is seeking a Senior Site Reliability Engineer (SRE) for the Data Science & ML Platform(s) team. The role involves designing, building, and maintaining services that enable real-time data analytics, streaming,

Staff Site Reliability Engineer, Cell Software

Tesla Motors

Remote or Fremont, California, USA

Full-time

Tesla is re-thinking how batteries are made from the ground up. We're designing new factories, new equipment, new processes and new software to rapidly scale battery manufacturing, globally. The primary bottleneck to Tesla's future expansion (and the transition to sustainable transport and energy storage) is our ability to produce and procure batteries - that's why we're innovating in-house, with our collection of world-class engineers, to redefine the industry. Software, data and automation all

Sr. Site Reliability Engineer, Compute SRE

Roblox

San Mateo, California, USA

Full-time

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators. At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with op

Staff Site Reliability Engineer, Incident and Disaster

Dropbox Inc

Remote

Full-time

Dropbox is a Virtual First company. For this role, we are hiring in Zones 2 and 3. Please refer to our Compensation section below to see what neighborhoods fall under each Zone. Role Description The Incident and Disaster Team aims to reduce Customer pain by speeding up incident response through standardized incident management processes and tooling as well as through incident prevention strategies such as disaster readiness , chaos testing, safer tooling, stronger controls, automated conformanc

Senior Platform Engineer - 100% remote from anywhere in the US

Calance

Remote or New York, New York, USA

Full-time

Position Summary: Our client is seeking a highly skilled and experienced Senior Platform Developer II to join their team. This pivotal role will be instrumental in building, scaling, and maintain the robust and secure infrastructure that powers our mission-critical platform. You will be a force multiplier, enabling our development teams to deliver features faster and more reliably. You will champion infrastructure-as-code principles, contribute code to platform scalability, drive automation acr

Internship, Site Reliability Engineer, Applications Engineering (Fall 2025)

Tesla Motors

Fremont, California, USA

Full-time

Consider before submitting an application: This position is expected to start around September 2025 and continue through the Fall term (approximately December 2025) or into Spring 2026 if available and there is an opportunity to do so. We ask for a minimum of 12 weeks, full-time and on-site, for most internships. Our internship program is for students who are actively enrolled in an academic program. entry level candidates seeking employment after graduation and not returning to school should a

Site Reliability Engineer

Madison-Davis, LLC

Remote

Contract

Role: Drive the technical implementation of monitoring and alerting strategies across enterprise-scale applications and infrastructure.Collaborate directly with development teams to ensure each new initiative includes the correct telemetry, log tagging, and alert payloads.Act as a liaison to Level 2 and Level 3 support teams to maintain and enhance monitoring dashboards used by the enterprise command center (EMC).Standardize alert formats to ensure consistency with SRE policies and support downs

Site Reliability Engineer

Zachary Piper Solutions, LLC

Remote

Full-time

Piper Companies is seeking a Remote Site Reliability Engineer to join a leading cybersecurity and cloud consulting firm. The Site Reliability Engineer will play a key role in building and maintaining secure, scalable infrastructure while supporting automation, compliance, and operational excellence across client environments. Responsibilities of the Site Reliability Engineer include: Develop and deploy automation scripts, tooling, and infrastructure to meet client needsManage patching processes

Site Reliability Engineer

General Dynamics

Remote or Aurora, Colorado, USA

Full-time

Basic Qualifications Bachelor's degree in Computer Science, a related field or equivalent experience is required plus a minimum of 5 years of relevant experience; or Master's degree plus 3 years of relevant experience. CLEARANCE REQUIREMENTS: Department of Defense TS/SCI security clearance is required at time of hire. Applicants selected will be subject to a U.S. Government security investigation and must meet eligibility requirements for access to classified information. Due to the nature of

Site Reliability Engineer

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Do you enjoy collaborating with teams to solve complex challenges? Do you have a passion for cutting edge technologies and tackling system problems? Join our highly skilled Site Reliability team Our team monitors and measures the reliability of our suite of Compute products and platform. In collaboration with Engineering and Product teams, we improve the performance and reliability of the products we support. Partner with the best You will apply statistical data analysis and an understandin

Sr. Site Reliability Engineer, Dojo

Tesla Motors

Palo Alto, California, USA

Full-time

We are seeking an experienced Site Reliability Engineer (SRE) to join our team responsible for ensuring the reliability, performance of our Dojo cluster infrastructure. The successful candidate will be responsible for providing exceptional customer response and support, managing third-party systems, and collaborating with various teams to ensure seamless operations. If you have a passion for troubleshooting, automation, and collaboration, we encourage you to apply. Responsibilities Respond to c

Principal Site Reliability Engineer

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Do you enjoy collaborating with teams to solve complex challenges? Do you have a passion for cutting edge technologies? Join our Compute Team! Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We do this while maintaining Akamai's mission at the forefront of what we do: make life better for billions of people, billions of times a day. Partner with the best As a Principal Site Reliability Engineer in the Virtualizatio

Senior Site Reliability Engineer

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Are you excited by the prospect of working with innovative security products? Do you enjoy creating innovative and strategic solutions to solve complex problems? Join Guardicore (now Akamai Enterprise Security Group) Guardicore (now Akamai Enterprise Security Group!) is changing the way organizations protect their data centers and clouds. Our team boasts some of the most talented and experienced cyber security and data center. We're always looking for new people to inspire us and make us bett

Senior Site Reliability Engineer

McKesson Corporation

Remote or Columbus, Ohio, USA

Full-time

McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more accessible and affordable. Here, we focus on the health, happiness, and well-being of you and those we serve - we care. What you do at McKesson matters. We foster a culture where you can grow, make an impact, and are empowered to bring new ideas. Together, we thrive as we shape the future of health for patien

Site Reliability Engineer

McKesson Corporation

Remote or Columbus, Ohio, USA

Full-time

McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more accessible and affordable. Here, we focus on the health, happiness, and well-being of you and those we serve - we care. What you do at McKesson matters. We foster a culture where you can grow, make an impact, and are empowered to bring new ideas. Together, we thrive as we shape the future of health for patien

Mainframe Site Reliability Engineer

Fynbosys Inc

Remote

Full-time

A Mainframe Site Reliability Engineer (SRE) applies software engineering principles to mainframe operations to enhance system reliability, scalability, and efficiency. Acting as a bridge between development and operations, the mainframe SRE focuses on automation, proactive monitoring, incident response, and performance optimization of mission-critical mainframe systems. Key responsibilities typically include:Automating repetitive operational tasks to reduce manual intervention and human errorEnh

Site Reliability Engineer: Splunk Cloud Services

Splunk Inc.

Colorado, USA

Full-time

Description Splunk, a Cisco company, is building a safer and more resilient digital world with an end-to-end full stack platform made for a hybrid, multi-cloud world. Leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. Our customers love our technology, but it's our caring employees that make Splunk stand out as an amazing career destination. No matter where in the world or what level of the organization, we approach our wor