Apply Now

Site Reliability Engineer / Remote

Remote in Remote, KY, US • Posted 30+ days ago • Updated 4 hours ago

Full Time

On-site

$160000 - $180000/yr

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

Software Engineering
Service Level
SAFE
DevOps
Kubernetes
Docker
Google Cloud
Google Cloud Platform
Computer Networking
Grafana
Microservices
PostgreSQL
Budget
IaaS
High Availability
Continuous Integration
Continuous Delivery
Pipeline Management
Root Cause Analysis
Scalability
Optimization
Cloud Computing
Incident Management
Management
Partnership
Collaboration

Summary

Join a fast-moving gaming technology company as a Site Reliability Engineer, ensuring a real-money gaming platform operates with exceptional reliability, performance, and scalability for lotteries and partners worldwide. This full-time role sits at the intersection of software engineering and infrastructure, focused on building resilient systems, automating operations, and maintaining production health across a distributed architecture. You'll partner closely with backend engineers to design fault-tolerant, observable, and scalable systems from day one - owning platform stability, production performance, deployment reliability, and incident response end to end.

This is a high-ownership SRE role where you're not just maintaining infrastructure - you're shaping it. You'll define and maintain Service Level Indicators and Objectives, align error budgets with contractual SLAs, and lead incident response, root cause analysis, and postmortems on a platform where reliability directly impacts real-money gaming experiences. The observability stack is modern and comprehensive, leveraging Grafana, Prometheus, Tempo, and Loki to give you full visibility into system health across all environments. The CI/CD and deployment automation scope is substantial, and you'll have real influence over cloud infrastructure optimization and cost efficiency. What makes this role particularly compelling is the mission-critical nature of the platform - when infrastructure just works, engineers ship faster, deployments are safe and repeatable, and systems scale automatically under load. For an SRE who takes pride in building systems that rarely break and recover quickly when they do, this role is built for you.

Required Skills & Experience

5+ years of experience in SRE, DevOps, or infrastructure engineering
Strong experience with Kubernetes, Docker, and cloud platforms with Google Cloud Platform preferred
Deep knowledge of distributed systems and networking
Experience building CI/CD pipelines and deployment automation
Proficiency with observability tools including Grafana, Prometheus, Tempo, and Loki
Experience managing production incidents and reliability processes including postmortems
Strong troubleshooting and systems thinking skills
Strong knowledge of microservices architecture
Familiarity with Go
Familiarity with service meshes such as Istio
Familiarity with managing PostgreSQL at scale

Desired Skills & Experience

Experience defining and maintaining SLIs, SLOs, and error budgets aligned to contractual SLAs
Background optimizing cloud infrastructure usage and cost efficiency
Experience managing secrets, environment configuration, and deployment safety in regulated or high-availability environments
Prior experience in gaming, fintech, or other mission-critical real-money platforms

What You Will Be Doing

Tech Breakdown

35% Platform Reliability and Infrastructure - uptime ownership, architecture design, and production health
25% CI/CD and Deployment Automation - pipeline management, release automation, and deployment safety
25% Observability and Incident Response - monitoring, logging, alerting, root cause analysis, and postmortems
15% Scalability and Cost Optimization - performance improvements, automation, and cloud efficiency

Daily Responsibilities

50% Infrastructure and Platform Ownership - reliability, deployment, configuration, and production readiness
30% Observability and Incident Management - monitoring systems, incident response, and SLO management
20% Engineering Partnership and Automation - collaborating with backend teams, reducing manual intervention, and optimizing operations

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10105282
Position Id: 870492
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote

•

Today

This is a Site Reliability Engineer opportunity supporting a high-scale platform in the real-money gaming and lottery space. This is a fully remote role (EST hours preferred) focused heavily on Kubernetes, Google Cloud Platform, CI/CD automation, and observability tooling (Grafana/Prometheus stack) while supporting a distributed, production-critical environment. This role is centered around owning reliability end-to-end. You will be responsible for ensuring platform stability, scalability, and p

Easy Apply

Full-time

$170000 - $180000

Site Reliability Engineer III- Eng

Remote or Alpharetta, Georgia

•

Today

Why UKG: At UKG, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact. Today, tens of millions of workers start and end their days with our workforce operating platform. Helping people get paid, grow in their careers, and shape the future of their industries. That's what we do. We never stop learning. We never stop challenging the norm. We push for better, and we celebrate the wins along the way. Here, you'll get flexibil

Full-time

USD 102,300.00 - 147,050.00 per year

Google Cloud Platform Site Reliability Engineer

Remote

•

3d ago

Job Description We are looking for a highly experienced Google Cloud Platform Site Reliability Engineer (SRE) with 10+ years of overall IT experience and strong expertise in designing, automating, monitoring, and supporting cloud-native infrastructure on Google Cloud Platform (Google Cloud Platform). The ideal candidate should have deep hands-on experience with Kubernetes, Terraform, CI/CD pipelines, monitoring tools, and production support in highly scalable enterprise environments. The candida

Easy Apply

Full-time

Depends on Experience

Principal AI Site Reliability Engineer (US REMOTE)

Remote

•

Today

Job Description Join Oracle's Health Data Intelligence (HDI) team as a Software Engineer 4, focused on Site Reliability Engineering for large-scale healthcare analytics platforms. In this role, you will design, build, and operate highly reliable, scalable infrastructure and data pipelines that power mission-critical analytics globally. You will also contribute to the next evolution of cloud operations by advancing automation, observability, and AI-assisted reliability practices. This includes

Full-time

USD 86,400.00 - 199,500.00 per year

Search all similar jobs

Site Reliability Engineer / Remote

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs