Reliability engineering Jobs in San Jose, CA

Refine Results
81 - 100 of 179 Jobs

Senior DGX Cloud Software Engineer - Infrastructure Automation and Distributed Systems

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

We are seeking Software Engineers with previous experience building and running private and public clouds at production scale. As part of the DGX Cloud team, you'll have the opportunity to support our customers' journeys in AI training and inference development by building the platforms, tools, and services that defend the operational capacity of our bare-metal, accelerated compute infrastructure and codify reliability best-practices in the broader DGX Cloud platform ecosystem. What you'll be d

Senior Site Reliability Engineer, HPC and LSF

NVIDIA Corporation

Santa Clara, California, USA

Full-time

NVIDIA is the leader in AI, machine learning and datacenter acceleration. NVIDIA is expanding that leadership into datacenter networking with ethernet switches, NICs and DPUs NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a "learning machine" th

Sr. DevOps/Automation Engineer

ACL Digital

San Jose, California, USA

Full-time

Job Title: DevOps/Automation Engineer Sr. Job ID: EBAYJP00022204 Location: Remote (Open to all time zones - slight preference for EST but not necessary) Bill Rate: $99.84/hr Pay Rate: $70.00/hr on W2 or $80.00/hr on C2C Duration: 10 Months with possible ext. Job Description: s a Software Engineer focused on Developer Experience (DevEx), you will be at the forefront of creating an efficient and enjoyable development workflow. Your role will involve improving the overall development lifecycle, fr

Systems Administrator / Technical Lead - Process Control Systems

Katalyst Healthcares and Lifesciences

Foster City, California, USA

Full-time

Job Summary: We are looking for an experienced professional who will be responsible for administration of Rockwell FactoryTalk-based process control systems and Inductive Automation's Ignition-based process control and environmental monitoring systems. The candidate should have hands-on experience deploying, configuring, and troubleshooting these systems in a Good Manufacturing Practice (GMP) environment, with a good understanding of FDA regulations. This role focuses on deploying system updates

Senior Site Reliability Engineer - DGX Cloud

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demand knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at N

Sr. Site Reliability Engineer, Compute SRE

Roblox

San Mateo, California, USA

Full-time

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators. At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with op

Information Systems Security Manager / Officer (ISSM / ISSO)

General Dynamics

Hayward, California, USA

Full-time

Type of Requisition: Regular Clearance Level Must Currently Possess: Top Secret Clearance Level Must Be Able to Obtain: Top Secret Public Trust/Other Required: None Job Family: Information Security Job Qualifications: Skills: CISM, CISSP, Information Assurance, Information Security Assessments, Risk Management Certifications: None Experience: 5 + years of related experience ship Required: Yes Job Description: Information Systems Security Manager / Officer (ISSM / ISSO) Transform technolo

Senior, Data Engineer - Data Ventures

Walmart Inc.

Sunnyvale, California, USA

Full-time

Position Summary What you'll do Job Description Do you have boundless energy and passion for engineering data used to solve dynamic problems that will shape the future of retail? With the sheer scale of Walmarts environment comes the biggest of big data sets. As a Walmart Data Engineer, you will dig into our mammoth scale of data to help unleash the power of retail data science by imagining, developing, and maintaining data pipelines that our Data Scientists and Analysts can rely on.You will

Software Engineering Technical Leader (Hybrid Remote- San Jose, CA)

Splunk Inc.

Remote or San Jose, California, USA

Full-time

Description Meet the Team Join us as we pursue our mission to make machine data accessible, usable, and valuable to everyone. At Cisco, we're passionate about empowering our customers through reliable, scalable, and secure infrastructure. Our people thrive on innovation, collaboration, and a shared drive to deliver outstanding experiences. We're committed to our work, to our customers, and most importantly-to each other's success. Are you passionate about building robust, scalable software sys

Senior Engineer - Data Warehouse Site Reliability Engineering (SRE) (ship required)

Oracle Corporation

Pleasanton, California, USA

Full-time

Job Description The candidate for this position must qualify the US-Gov requirements - should be a and resident in the US. We are looking for senior engineers with experience in supporting data warehousing products. As a member of the Product development organization, focus will be on working with development teams, providing timely support to customers and identify/implementing process automation, for cloud BI product. BS or higher degree in Computer Science / Engineering or equivalent 3+ y

Data Ingestion SRE, Data Platform - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A218312 Apply to this job Share this listing: Responsibilities Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the data platform area, you will have the opportunity to manage the services and infrastructures in one of the largest dataplaforms in the world that directly supports the TikTok a

Senior System Reliability Engineer

NVIDIA Corporation

Santa Clara, California, USA

Full-time

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing - with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as "the AI computing company." We're looking to grow our company and build our teams with the most thoughtful people in the world. J

Crypto Trading Systems Engineer - Asia Timezone

Westbury Partners

Remote or Chicago, Illinois, USA

Full-time

What You'll Do: Design and implement cutting-edge trading and market-making strategies Work closely with a trader based in Asia to identify and capitalize on market opportunities Build and maintain C++ and Python systems for high-frequency trading across global crypto exchanges Optimize performance, reduce latency, and support real-time 24/7 trading operations Take full ownership of systems-from design and coding to monitoring and troubleshooting Collaborate with developers, traders, and quants

Metrology Software Engineer

Tesla Motors

Fremont, California, USA

Full-time

Tesla is seeking a highly motivated Metrology Software Engineer to develop and implement cutting-edge inline metrology systems for our advanced battery cell manufacturing lines (powder, film, and electrode processing lines). This role will be instrumental in developing, optimizing, and scaling up metrology systems, as well as working within a cross-functional team to take new battery designs from concept to high-volume production. The battery cell is a critical component of Tesla vehicles and e

Senior Site Reliability Engineer

General Motors

Remote

Full-time

Job Description Develop and design software applications for driverless technology company. Duties may include: Build out and improve observability systems, tools and the related codebase. Contribute code, perform code reviews, and create technical designs that improve performance and reliability of observability systems using software and systems engineering skills. Partner with other Software Engineering teams to better understand use-cases and guide the engineers to use the existing tools eff

Metrology Systems Engineer, Cell Electrode

Tesla Motors

Fremont, California, USA

Full-time

Tesla is seeking a highly motivated Metrology Engineer to develop and implement cutting-edge inline metrology systems for our advanced battery cell manufacturing lines (powder, film, and electrode processing lines). This role will be instrumental in developing, optimizing, and scaling up metrology tools and methods, working within a cross-functional team to take new battery designs from concept to high-volume production. The battery cell is a critical component of Tesla vehicles and energy stor

Backend/Platform Software Engineer

PsiQuantum

Palo Alto, California, USA

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a real quantum computer. PsiQuantum is on a mission to build the first real, useful quantum computers, capable of delivering the world-changing applications that the technology has long promised. We know that means we will need to build a system with roughly 1 million qubits that supports fault tolerant error correction within a scalable architecture, and a data center footprint. By harnes

Transition Support Engineer

JLL

Remote or Denver, Colorado, USA

Full-time

JLL empowers you to shape a brighter way. Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services, advisory and technology for our clients. We are committed to hiring the best, most talented people and empowering them to thrive, grow meaningful careers and to find a place where they belong. Whether you've got deep experience in commercial real estate, skilled trades or technology, or you're looking to apply your relevant e

Director Site Reliability Engineering

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Do you enjoy collaborating with teams to solve complex challenges? Do you have a passion for cutting edge technologies and tackling system problems? Join our highly skilled Network SRE team We build and operate the Network infrastructure powering Akamai's global cloud platform. Our mission is to deliver reliable, scalable, and performant systems that enable customers to run critical workloads with confidence. As part of this team, you'll help ensure reliability at scale, maintaining the avail

Sr Implementation Lead, SRE (CoP)

Northern Trust

Remote or Chicago, Illinois, USA

Full-time

About Northern Trust: Northern Trust, a Fortune 500 company, is a globally recognized, award-winning financial institution that has been in continuous operation since 1889. Northern Trust is proud to provide innovative financial services and guidance to the world's most successful individuals, families, and institutions by remaining true to our enduring principles of service, expertise, and integrity. With more than 130 years of financial experience and over 22,000 partners, we serve the world'