site reliability engineer Jobs in san jose, ca

Refine Results
41 - 60 of 250 Jobs

Senior Machine Learning Ops Engineer, Global SRE

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A04380 Apply to this job Share this listing: Responsibilities MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities 1) Responsible for setting SLOs of online machine lear

Machine Learning Ops Engineer, Global SRE

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A108064 Apply to this job Share this listing: Responsibilities MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities 1) Responsible for setting SLOs of online machine lea

Site Reliability Engineer - Global SRE, Monetization Technology

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : R2861 Apply to this job Share this listing: Responsibilities TikTok is one of the fastest growing apps in the world, and we're seeking Site Reliability Engineers (SREs) to join our monetization technology team. The monetization technology team works on building and running large-scale, globally distributed, fault-tolerant ads systems. SREs keep the systems up and running with the highest level of availability, ensuring our users hav

Data Site Reliability Engineer, Video Platform - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A30565 Apply to this job Share this listing: Responsibilities Team Intro This is a Site Reliability Engineer role, focusing on the data pipeline reliability for the Video Platform team in USDS. Data SREs monitor data and keep production batch and realtime processing jobs up and running with the highest level of availability, ensuring our users have the freshest, complete and correct data possible. In order to enhance collaboration a

Senior Site Reliability Engineer - Global SRE, Monetization Technology

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A180769 Apply to this job Share this listing: Responsibilities TikTok is one of the fastest growing apps in the world, and we're seeking Site Reliability Engineers (SREs) to join our monetization technology team. The monetization technology team works on building and running large-scale, globally distributed, fault-tolerant ads systems. SREs keep the systems up and running with the highest level of availability, ensuring our users h

Site Reliability Engineer Graduate (TikTok Product - USDS) - 2025 Start (BS/MS)

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A224009 Apply to this job Share this listing: Responsibilities About the Team Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed and fault-tolerant systems. Product SREs help ensure the reliability and uptime for the services underpinning the TikTok product. Our team pays great attention to optimizing existing systems, working closely with cross functional

Senior Site Reliability Engineer - USDS (Multiple Positions)

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A33375A Apply to this job Share this listing: Responsibilities About TikTok U.S. Data Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U

Tech Lead, SRE - Recommendation Infrastructure

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A206446 Apply to this job Share this listing: Responsibilities Our Recommendation Infrastructure Team is responsible for building up and optimizing the architecture for our recommendation system to provide the most stable and best experience for our TikTok users. SREs in our team keep the systems up and running with the highest level of availability, and create highly automated systems and pipelines. What You'll Do Engage in and imp

Site Reliability Engineer, Infrastructure Security

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : 3CNV Apply to this job Share this listing: Responsibilities Our Infrastructure Engineering team supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services and making sure they are scalable and are reliable. Responsibilities - Conduct security reviews of core corporate and production infrastruc

Site Reliability Engineer, Systems - Infrastructure Engineering

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A1965 Apply to this job Share this listing: Responsibilities Our Infrastructure Engineering team supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services and making sure they are scalable and are reliable. Roles and Responsibilities - Operate basic system infrastructures like DNS, NTP, authe

Senior Site Reliability Engineer, Product - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A215600 Apply to this job Share this listing: Responsibilities Team Intro: The Product Engineering team monitors and maintains the availability of TikTok, including services such as video playback, content discovery/recommendations, live streaming, and customer service feedback. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that

Data Ingestion SRE, Data Platform - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A218312 Apply to this job Share this listing: Responsibilities Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the data platform area, you will have the opportunity to manage the services and infrastructures in one of the largest dataplaforms in the world that directly supports the TikTok a

Site Reliability Engineer, Trust & Safety - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : VGGP Apply to this job Share this listing: Responsibilities Team Intro: The Trust and Safety (TnS) engineering team of US Tech Service department at TikTok is fast growing and responsible for building machine learning models and systems to identify and defend internet abuse and fraud on our platform. Our mission is to protect billions of users and publishers across the globe every day. We embrace the state-of-the-art machine learnin

Network Site Reliability Engineer

NVIDIA Corporation

Santa Clara, California, USA

Full-time

The Enterprise Network Support and SRE team is looking to add a seasoned Technical SRE lead to help actualize the SRE vision for our network infrastructure. We are looking for an engineer who is passionate about the network and making its operation seamless with a focus on user experience. This role will offer several opportunities to solve problems by being hands-on with troubleshooting, focused on network automation, observability, documentation, and excellence in operations. This Network SRE

Sr. Site Reliability Engineer - U.S. Citizen - This role sits within Optum Serves Technology Product organization

Widescope Consulting and Contracting Services

Remote

Full-time

Job Title: Sr. Site Reliability Engineer Location: Headquarters / Telecommute Classification (HR only): Exempt Non-Exempt Reports To (Title): COO Widescope Consulting and Contracting JOB SUMMARY The statements below are not intended to be all-inclusive of the duties and responsibilities of the position. Based on leadership decisions and business needs, all other duties as assigned will be expected for each position.Grafana Widescope Consulting and Contracting is proud to serve our nation's mi

Principal AI Infrastructure SRE Engineer

NVIDIA Corporation

Santa Clara, California, USA

Full-time

NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation that's fueled by great technology and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best tal

Sr. Site Reliability Engineer

Adobe Systems

San Jose, California, USA

Full-time

Our Company Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-everything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We're on a mission to hire the very best and are committed to creating exceptional employee experiences wher

Sr Site Reliability Engineer (App Service Team)

PaloAlto Networks

Santa Clara, California, USA

Full-time

Company Description Our Mission At Palo Alto Networks everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we're looking for innovators who are as committed to shaping the future of cybersecurity as we are. Who We Are We take our mission of

SRE/Devops/Kubernetes/Python

Infonex Technologies, Inc.

Pleasanton, California, USA

Contract

Position: Devops/KUBERNETES -Open Position-CA Type: contract Duration: 12+ months Location: Pleasanton, CA Job Description: Required Skills: Spark Hadoop/CDH H2O/Steam MapR Kubernetes Docker Tensorflow Apache Airflow Jupyterhub Rstudio PyTorch ELK OpenVino MySql GitLab Traefik Prometheus, Grafana, Node Manager, Alert Manager Vault Notes: Currently client has on prem environment The client wants experience in containerization with Kubernetes, Vault, Slurm with Rstudio hook all the components

Internship, Site Reliability Engineer, Applications Engineering (Fall 2025)

Tesla Motors

Fremont, California, USA

Full-time

Consider before submitting an application: This position is expected to start around September 2025 and continue through the Fall term (approximately December 2025) or into Spring 2026 if available and there is an opportunity to do so. We ask for a minimum of 12 weeks, full-time and on-site, for most internships. Our internship program is for students who are actively enrolled in an academic program. entry level candidates seeking employment after graduation and not returning to school should a