Apply Now

Google Cloud Platform Site Reliability Engineer

Remote • Posted 6 hours ago • Updated 6 hours ago

Full Time

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

High Availability
Disaster Recovery
Dragon NaturallySpeaking
GitLab
Good Clinical Practice
Continuous Delivery
Continuous Integration
Failover
GitHub
Communication
Computer Networking
Configuration Management
DevOps
Amazon Web Services
Ansible
Google Cloud Platform
Grafana
IaaS
Agile
Backup
Bash
Shell
Terraform
Virtual Private Network
Reliability Engineering
Root Cause Analysis
Scalability
Scripting
Operational Excellence
Production Support
Provisioning
Python
ROOT
Kubernetes
Machine Learning Operations (ML Ops)
Management
Microsoft Azure
Cloud Computing
DNS
Google Cloud
Incident Management
Jenkins
Scrum

Summary

Job Description

We are looking for a highly experienced Google Cloud Platform Site Reliability Engineer (SRE) with 10+ years of overall IT experience and strong expertise in designing, automating, monitoring, and supporting cloud-native infrastructure on Google Cloud Platform (Google Cloud Platform). The ideal candidate should have deep hands-on experience with Kubernetes, Terraform, CI/CD pipelines, monitoring tools, and production support in highly scalable enterprise environments.

The candidate will work closely with DevOps, Development, Security, and Infrastructure teams to improve system reliability, automation, scalability, and operational excellence.

Required Skills & Experience
10+ years of IT experience with strong expertise in Cloud Infrastructure and Site Reliability Engineering
5+ years of hands-on experience with Google Cloud Platform (Google Cloud Platform)
Strong experience with:
Google Kubernetes Engine (GKE)
Compute Engine
Cloud Load Balancing
Cloud Storage
BigQuery
Pub/Sub
Cloud Functions
IAM
Cloud Monitoring & Logging
Strong experience in Kubernetes and container orchestration
Expertise in Infrastructure as Code (IaC) using Terraform
Strong scripting/programming skills in Python, Bash, or Shell scripting
Experience building and managing CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI/CD, or Azure DevOps
Experience with monitoring and observability tools such as:
Prometheus
Grafana
ELK Stack
Cloud Operations Suite
Experience implementing SRE principles:
SLI/SLO/SLA
Incident Management
Root Cause Analysis (RCA)
High Availability
Disaster Recovery
Capacity Planning
Experience with configuration management tools like Ansible
Strong understanding of networking concepts, DNS, Load Balancers, VPN, and security best practices
Experience with GitOps, ArgoCD, or MLOps is a plus
Familiarity with Agile/Scrum methodologies
Excellent communication and troubleshooting skills
Responsibilities
Design, deploy, and maintain scalable and reliable cloud infrastructure on Google Cloud Platform
Manage and support Kubernetes/GKE clusters in production environments
Automate infrastructure provisioning and deployments using Terraform and CI/CD pipelines
Implement monitoring, logging, alerting, and observability solutions
Ensure high availability, scalability, reliability, and security of cloud platforms
Troubleshoot production incidents and perform root cause analysis
Optimize cloud infrastructure cost, performance, and resource utilization
Collaborate with development and DevOps teams to improve deployment reliability
Implement backup, disaster recovery, and failover strategies
Define and maintain SLOs, SLIs, and operational best practices
Participate in on-call support and incident response activities
Preferred Qualifications
Google Cloud Platform Professional Cloud DevOps Engineer Certification
Google Cloud Platform Professional Cloud Architect Certification
Kubernetes Certifications (CKA/CKAD) preferred
Experience with multi-cloud environments (AWS/Azure) is a plus
Keywords

Google Cloud Platform, SRE, Site Reliability Engineer, GKE, Kubernetes, Terraform, CI/CD, DevOps, Python, Cloud Monitoring, Prometheus, Grafana, Jenkins, GitHub Actions, Cloud Operations, IAM, Pub/Sub, BigQuery, Cloud Functions, Infrastructure Automation, Production Support, Observability, Incident Management, GitOps, ArgoCD, Ansible

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91094776
Position Id: 8964304
Posted 6 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote

•

Today

This is a Site Reliability Engineer opportunity supporting a high-scale platform in the real-money gaming and lottery space. This is a fully remote role (EST hours preferred) focused heavily on Kubernetes, Google Cloud Platform, CI/CD automation, and observability tooling (Grafana/Prometheus stack) while supporting a distributed, production-critical environment. This role is centered around owning reliability end-to-end. You will be responsible for ensuring platform stability, scalability, and p

Easy Apply

Full-time

$170000 - $180000

Sr. Google Cloud Platform Cloud Infrastructure Engineer - 100% Remote

Remote

•

Today

Overview We are seeking a Senior Cloud Infrastructure Engineer with deep expertise in Google Cloud Platform (Google Cloud Platform) and Infrastructure as Code (Terraform) to join a high-performing infrastructure and cloud engineering team. This role supports a complex, enterprise-scale, multi-cloud environment across Google Cloud Platform, Azure, and private cloud, with an emphasis on cloud platform engineering, automation, and operational excellence. This is a senior-level position requiring st

Full-time

160-170k

Mid-Level SRE / Google Cloud Platform / Kubernetes / Remote

Remote

•

Today

A fast-growing AI data platform is currently looking for a Site Reliability Engineer to join their U.S.-based infrastructure team. This individual will play a key role in building and securing the infrastructure behind a platform that powers complex data workflows for the aerospace and defense industry. In this high-impact role, you'll work closely with a global engineering team to design, build, and scale cloud-native infrastructure supporting AI/ML workloads. The ideal candidate will bring str

Easy Apply

Full-time

$110000 - $150000

Site Reliability Engineer / Remote

Remote

•

Today

Join a fast-moving gaming technology company as a Site Reliability Engineer, ensuring a real-money gaming platform operates with exceptional reliability, performance, and scalability for lotteries and partners worldwide. This full-time role sits at the intersection of software engineering and infrastructure, focused on building resilient systems, automating operations, and maintaining production health across a distributed architecture. You'll partner closely with backend engineers to design fau

Easy Apply

Full-time

$160000 - $180000

Search all similar jobs

Google Cloud Platform Site Reliability Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs