Google Cloud Platform Site Reliability Engineer

Remote • Posted 6 hours ago • Updated 6 hours ago
Full Time
No Travel Required
Remote
Depends on Experience
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • High Availability
  • Disaster Recovery
  • Dragon NaturallySpeaking
  • GitLab
  • Good Clinical Practice
  • Continuous Delivery
  • Continuous Integration
  • Failover
  • GitHub
  • Communication
  • Computer Networking
  • Configuration Management
  • DevOps
  • Amazon Web Services
  • Ansible
  • Google Cloud Platform
  • Grafana
  • IaaS
  • Agile
  • Backup
  • Bash
  • Shell
  • Terraform
  • Virtual Private Network
  • Reliability Engineering
  • Root Cause Analysis
  • Scalability
  • Scripting
  • Operational Excellence
  • Production Support
  • Provisioning
  • Python
  • ROOT
  • Kubernetes
  • Machine Learning Operations (ML Ops)
  • Management
  • Microsoft Azure
  • Cloud Computing
  • DNS
  • Google Cloud
  • Incident Management
  • Jenkins
  • Scrum

Summary

Job Description

We are looking for a highly experienced Google Cloud Platform Site Reliability Engineer (SRE) with 10+ years of overall IT experience and strong expertise in designing, automating, monitoring, and supporting cloud-native infrastructure on Google Cloud Platform (Google Cloud Platform). The ideal candidate should have deep hands-on experience with Kubernetes, Terraform, CI/CD pipelines, monitoring tools, and production support in highly scalable enterprise environments.

The candidate will work closely with DevOps, Development, Security, and Infrastructure teams to improve system reliability, automation, scalability, and operational excellence.

Required Skills & Experience
10+ years of IT experience with strong expertise in Cloud Infrastructure and Site Reliability Engineering
5+ years of hands-on experience with Google Cloud Platform (Google Cloud Platform)
Strong experience with:
Google Kubernetes Engine (GKE)
Compute Engine
Cloud Load Balancing
Cloud Storage
BigQuery
Pub/Sub
Cloud Functions
IAM
Cloud Monitoring & Logging
Strong experience in Kubernetes and container orchestration
Expertise in Infrastructure as Code (IaC) using Terraform
Strong scripting/programming skills in Python, Bash, or Shell scripting
Experience building and managing CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI/CD, or Azure DevOps
Experience with monitoring and observability tools such as:
Prometheus
Grafana
ELK Stack
Cloud Operations Suite
Experience implementing SRE principles:
SLI/SLO/SLA
Incident Management
Root Cause Analysis (RCA)
High Availability
Disaster Recovery
Capacity Planning
Experience with configuration management tools like Ansible
Strong understanding of networking concepts, DNS, Load Balancers, VPN, and security best practices
Experience with GitOps, ArgoCD, or MLOps is a plus
Familiarity with Agile/Scrum methodologies
Excellent communication and troubleshooting skills
Responsibilities
Design, deploy, and maintain scalable and reliable cloud infrastructure on Google Cloud Platform
Manage and support Kubernetes/GKE clusters in production environments
Automate infrastructure provisioning and deployments using Terraform and CI/CD pipelines
Implement monitoring, logging, alerting, and observability solutions
Ensure high availability, scalability, reliability, and security of cloud platforms
Troubleshoot production incidents and perform root cause analysis
Optimize cloud infrastructure cost, performance, and resource utilization
Collaborate with development and DevOps teams to improve deployment reliability
Implement backup, disaster recovery, and failover strategies
Define and maintain SLOs, SLIs, and operational best practices
Participate in on-call support and incident response activities
Preferred Qualifications
Google Cloud Platform Professional Cloud DevOps Engineer Certification
Google Cloud Platform Professional Cloud Architect Certification
Kubernetes Certifications (CKA/CKAD) preferred
Experience with multi-cloud environments (AWS/Azure) is a plus
Keywords

Google Cloud Platform, SRE, Site Reliability Engineer, GKE, Kubernetes, Terraform, CI/CD, DevOps, Python, Cloud Monitoring, Prometheus, Grafana, Jenkins, GitHub Actions, Cloud Operations, IAM, Pub/Sub, BigQuery, Cloud Functions, Infrastructure Automation, Production Support, Observability, Incident Management, GitOps, ArgoCD, Ansible

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91094776
  • Position Id: 8964304
  • Posted 6 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Full-time

$170000 - $180000

Remote

Today

Full-time

160-170k

Remote

Today

Easy Apply

Full-time

$110000 - $150000

Remote

Today

Easy Apply

Full-time

$160000 - $180000

Search all similar jobs