Lead Site Reliability Engineer Jobs in San Jose, CA

Refine Results
1 - 20 of 60 Jobs

Tech lead Site Reliability Engineer, Edge - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A104498 Apply to this job Share this listing: Responsibilities Site Reliability Engineering combines software and system engineering with system operations to build and run large-scale, massively distributed infrastructure. Our Edge SREs ensure infrastructure services are reliable, fault-tolerant, efficiently scalable and cost-effective. We dive deep into the stack, including network, hardware, OS, and applications, to quickly resol

Tech Lead, SRE - Recommendation Infrastructure

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A206446 Apply to this job Share this listing: Responsibilities Our Recommendation Infrastructure Team is responsible for building up and optimizing the architecture for our recommendation system to provide the most stable and best experience for our TikTok users. SREs in our team keep the systems up and running with the highest level of availability, and create highly automated systems and pipelines. What You'll Do Engage in and imp

Lead Site Reliability Engineer - Remote

UnitedHealth Group

Remote or Minnetonka, Minnesota, USA

Full-time

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health op

Lead Site Reliability Engineer

Centene Corporation

Missouri, USA

Full-time

You could be the one who changes everything for our 28 million members by using technology to improve health outcomes around the world. As a diversified, national organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead Site Reliability Engineer to join our team. The ideal candidate will be responsible for developing and creating monitor

Lead Site Reliability Engineer, Observability - Remote

Cisco Systems, Inc.

Remote

Full-time

Application window is open until further notice. The Meraki cloud supports millions of customer devices from 10 data centers around the world. Meraki's customer base has grown by a factor of 2-3 every year, serving billions of HTTP requests per day globally. Our customers depend on our products to run their critical infrastructure of network switches, security appliances, wireless APs and security cameras. As SREs at Meraki, we are responsible for building and growing the cloud that supports t

Senior Lead Site Reliability Engineer - Remote

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Would you enjoy improving stability and safety of one of the largest global networks? \n Would you enjoy hands-on network operations work on a global scale to improve our operational efficiency? \n Join our Platform Security Engineering Team \n The Platform Security Engineering team is a group of engineers that support and secure Akamai's global network and Linode cloud systems. Our systems provide data security, server integrity, network access, and secure communications infrastructure. This is

Tech Lead Machine Learning Ops Engineer, Global SRE

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A181865 Apply to this job Share this listing: Responsibilities MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities 1) Responsible for setting SLOs of online machine lea

Sr. DevOps/Site Reliability Engineer (SRE)

JKV International

Mountain View, California, USA

Contract

Job Title: Sr. DevOps/Site Reliability Engineer (SRE)Location: Mountain View, CA (Onsite)Position Type: Fulltime | Independent | H1B TransferInterview Process: Final In-Person (F2F) Interview Required About the Role:We are looking for a passionate and experienced Sr. DevOps/Site Reliability Engineer (SRE) to join our dynamic Platform Engineering team. You will work on cutting-edge cloud platforms like Azure, AWS, or Google Cloud Platform, leveraging state-of-the-art CI/CD tools to support modern

Site Reliability Engineer. Senior Lead

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Job Title: Site Reliability Engineer. Senior Lead Work Location: 145 Broadway, Cambridge, MA 02142 \n Job Description: \n Akamai Technologies, Inc. is hiring for the following role in Cambridge, MA (multiple openings): Site Reliability Engineer. Senior Lead. Working on analytical projects related to the metadata systems which support fast and reliable configuration of the company's global network. Leading the effort in working closely with the development teams in designing/implementing performa

Azure SRE Architect

Stanley David and Associates

Remote

Full-time

Role :: SRE Architect Location :: Marlborough, MA /Remote Type :: Fulltime Job Description Technical ExpertiseDeep understanding of SRE principles, SRE model, and DevOps methodologies.Experience designing highly available, scalable, and resilient distributed systems.Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture).Skilled in cloud platforms: Azure, Google Cloud Platform.Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Re

SRE Architect

Stanley David and Associates

Remote

Full-time

1. Technical Expertise Deep understanding of SRE principles, SRE model, and DevOps methodologies. Experience designing highly available, scalable, and resilient distributed systems. Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture). Skilled in cloud platforms: Azure, Google Cloud Platform. Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Relic, Splunk, AppDynamics. 2. Framework Design & Governance Define and validate SLOs,

Senior Dev Operations Engineer SRE

Buxton Consulting

Remote

Contract

Senior Dev Operations Engineer SRE Remote (Pleasanton, CA) 12+ Months Top 3 Must Haves Experience setting up alerts / alarms / notifications in AWS cloud. CloudWatch / Dynatrace Experience with AWS solutions using AWS services including Kafka, ECS, EKS. Experience with IaC (Infrastructure as code) CDK or Terraform. Thanks and Regards, Ajeet Singh Buxton Consulting 2010 Crow Canyon Place STE 100 San Ramon, CA 94583 Direct: Email:

Senior Site Reliability Engineer

Motion Recruitment Partners, LLC

San Jose, California, USA

Full-time

One of our clients in the entertainment platform space is looking for a Level 5 Reliability Engineer with a deep background in nix systems, networking, data analysis, and operating large-scale platforms to help build, scale, automate, and maintain our globally distributed infrastructure. Key Responsibilities Lead efforts to enhance system resiliency, observability, monitoring, and automation-ensuring global operations remain scalable and reliable.Collect, evaluate, and interpret significant volu

Sr. Site Reliability Engineer

Adobe Systems

San Jose, California, USA

Full-time

Our Company Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-everything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We're on a mission to hire the very best and are committed to creating exceptional employee experiences wher

Senior Site Reliability Engineer - Data Infrastructure

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A33665 Apply to this job Share this listing: Responsibilities Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. We seamlessly merge software development and infrastructure operations to design, build, and manage large-scale, highly distributed systems. We take pride in overseeing one of the industry's most extensive cloud infrastructures. As software development evolves, building systems fro

Senior Site Reliability Engineer

NVIDIA Corporation

Santa Clara, California, USA

Full-time

Join our team in Santa Clara, CA, USA as a Senior Site Reliability Engineer. At NVIDIA, you'll be part of the team shaping the future of computing and guaranteeing the smooth operation of our brand-new technologies. Our mission is to leverage AI's power to build outstanding and pioneering solutions that have a significant impact on the world. What you'll be doing: Own the solutions you build, collaborating with cross-functional teams to successfully implement them.Collaborate with various teams

Senior Staff Site Reliability Engineer - CDN

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. Our legacy of innovation is driven by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, y

Senior Site Reliability Engineer, Product - USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A215600 Apply to this job Share this listing: Responsibilities Team Intro: The Product Engineering team monitors and maintains the availability of TikTok, including services such as video playback, content discovery/recommendations, live streaming, and customer service feedback. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that

Senior Site Reliability Engineer - USDS (Multiple Positions)

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A33375A Apply to this job Share this listing: Responsibilities About TikTok U.S. Data Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U

Senior Site Reliability Engineer - Global SRE, Monetization Technology

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : A180769 Apply to this job Share this listing: Responsibilities TikTok is one of the fastest growing apps in the world, and we're seeking Site Reliability Engineers (SREs) to join our monetization technology team. The monetization technology team works on building and running large-scale, globally distributed, fault-tolerant ads systems. SREs keep the systems up and running with the highest level of availability, ensuring our users h