reliability engineer Jobs

Refine Results
441 - 460 of 728 Jobs

Sr BizOps Engineer

Mastercard

O'Fallon, Missouri, USA

Full-time

Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest po

Site Reliability Engineer (with strong developer skills) **REMOTE**

Artech, LLC

Remote

Contract

Do something big and innovative! Stretch your creative muscles and work on big issues. Since 1989, we have developed technology environments, applications, and tools by providing experienced teams to implement, enhance, and maintain our clients essential systems and applications. Come join the Scalence team! Job Title: Site Reliability Engineer Location: 100% REMOTE (PST work hours) Duration: 6-12+ months Pay Rate : $60 - $65 /hr Job Description You will play a pivotal role in ensuring the

AI Site Reliability Engineer (Remote) - & GCEAD

Bridge Flair LLC

San Jose, California, USA

Contract

Position SummaryWe are looking for an AI Site Reliability Engineer to manage, optimize, and scale high-performance compute (HPC) and AI platforms including NVIDIA DGX and Cisco UCS. This role blends SRE principles, AI/ML operationalization, and infrastructure automation for mission-critical environments. ResponsibilitiesManage & scale HPC platforms (NVIDIA DGX / Cisco UCS) for AI workloads. Ensure availability, latency, scalability, and efficiency across systems. Drive capacity planning, perform

Site Reliability Engineer - (TUE-SAT 1st Shift)

Cboe Global Markets

Kansas, USA

Full-time

Job Description: Building trusted markets - powered by our people. At Cboe Global Markets, we inspire our people to solve complex challenges together because what we do matters. We provide the financial infrastructure that powers the global economy. As a leading provider of market infrastructure and tradable products, Cboe delivers cutting-edge trading, clearing and investment solutions to market participants around the world. We're building inclusive ways to support professional and personal de

Cloud Site Reliability Engineer - Azure AWS (34084)

Myticas LLC

Remote or CA

Contract

Cloud Site Reliability Engineer - AWS & Azure Overall Responsibilities Oversee the design and improvement of infrastructure using SRE best practices, including IaC, recovery automation, and systems that detect and resolve issues independently. Manage and fine-tune critical services across both cloud and on-prem environments: Kubernetes clusters, CI/CD pipelines, artifact registries, and custom workloads. Enhance observability through intelligent logging, metrics, tracing, and alerting. Ensuring

Senior Site Reliability Engineer / Google Cloud Platform / Atlanta / Hybrid

Motion Recruitment Partners, LLC

Atlanta, Georgia, USA

Full-time

A growing construction software company here in Atlanta is looking for someone to join their team! This is a senior-level sight-reliability engineering position that is full-time and hybrid here in Atlanta. You will be responsible for the upkeep of Google Cloud Platform environments, helping to support their DevOps team, and driving innovation and experimentation. You will be working on a small team with lots of opportunities for growth! You'll play a key role in the growth of this company as a

Site Reliability Engineer (SRE, 8) - (Healthcare Technology)

Kyndryl

California, USA

Full-time

Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our infor

Infrastructure Site Reliability Engineer (Entry Level)- USDS

TikTok

San Jose, California, USA

Full-time

Location : San Jose Employment Type : Regular Job Code : J14H2 Apply to this job Share this listing: Responsibilities Team Introduction Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you'll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embra

Cyber Recovery Site Reliability Engineer (Network Security)

Allstate Insurance Company

Remote

Full-time

At Allstate, great things happen when our people work together to protect families and their belongings from life's uncertainties. And for more than 90 years our innovative drive has kept us a step ahead of our customers' evolving needs. From advocating for seat belts, air bags and graduated driving laws, to being an industry leader in pricing sophistication, telematics, and, more recently, device and identity protection. Job Description **We have 1 position on this team focused in Network Secur

Site Reliability Engineer, Infrastructure and Assurance Services - USDS

TikTok

Seattle, Washington, USA

Full-time

Location : Seattle Employment Type : Regular Job Code : A197531 Apply to this job Share this listing: Responsibilities The Systems and Networking team is committed to ensuring the seamless operation of TikTok's US physical infrastructure. We handle the provisioning of physical servers and maintain the TikTok US physical network. Additionally, we engage in collaborative efforts with vendors such as OCI and Akamai to manage physical hardware, networks, and uphold assurance and compliance ob

Principal Site Reliability Engineer - Enterprise AI Platform

NVIDIA Corporation

Santa Clara, California, USA

Full-time

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As

Senior Site Reliability Engineer - Observability and Telemetry Platform

NVIDIA Corporation

Remote or Santa Clara, California, USA

Full-time

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at

Cyber Recovery Site Reliability Engineer (Data Protection)

Allstate Insurance Company

Remote

Full-time

At Allstate, great things happen when our people work together to protect families and their belongings from life's uncertainties. And for more than 90 years our innovative drive has kept us a step ahead of our customers' evolving needs. From advocating for seat belts, air bags and graduated driving laws, to being an industry leader in pricing sophistication, telematics, and, more recently, device and identity protection. Job Description **We have 1 position on this team focused in Data Protecti

Lead Site Reliability Engineer, Infrastructure Production Team

JPMorgan Chase & Co.

Plano, Texas, USA

Full-time

Job Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure & Production Management sector of Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues fac

Site Reliability Engineer (SRE, 9) - (Healthcare Technology)

Kyndryl

California, USA

Full-time

Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our infor

Senior II Site Reliability Engineer

Akamai Technologies

Cambridge, England, United Kingdom

Full-time

Do you like collaborating across teams to solve complex problems? Do you have a passion for cutting edge technologies and tackling system problems? Join our highly-skilled Site Reliability team! Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We create solutions that manage our Compute platform, focusing on cloud interfaces - Compute Portals and APIs. We do this while maintaining Akamai's mission to make life better

FLEX Senior Systems Engineer - SRE

Marriott International

Bethesda, Maryland, USA

Full-time

Job Description The Senior Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally. This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners closely with Product Teams, Applications teams, Inf

FLEX Senior System Engineer - SRE

Marriott International

Bethesda, Maryland, USA

Full-time

Job Description This is a temporary position. Position Summary: The Senior Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally. This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners c

Site Reliability Engineer (Middleware Platforms Kafka/Tibco EMS/API)

Euclid Innovations

Fort Mill, South Carolina, USA

Third Party, Contract

Job Description:We are seeking an experienced Site Reliability Engineer (SRE) with strong middleware platform expertise to enhance monitoring, reliability, and operational excellence across enterprise systems. The ideal candidate will have hands-on experience with Kafka, Tibco EMS, and API platforms, and a proven track record in monitoring automation and dashboard optimization. Key Responsibilities: Build specifications for a single pane of glass solution for middleware monitoring across compone

Sr. AWS DevOps Engineer

SYSTEM SOFT TECHNOLOGIES LLC

Denver, Colorado, USA

Contract

Onsite position - Denver, Colorado No Sponsorship No Corp to Corp DEVOPS ENGINEER REQUIREMENTS:Must possess prior and recent experience Engineering/Architecting AWS cloud platforms as a Cloud Engineer, Site Reliability Engineer, or DevOps Engineer within a professional and ideally enterprise environmentProven expertise with AWS, particularly with AWS Core Services such as but not needing all: EC2 (Elastic Compute Cloud), ECS (Elastic Container Service), EKS (Elastic Kubernetes Service), RDS (R