Google Cloud Platform Site Reliability EngineerLocation:Atlanta, GA

Overview

On Site
$40+
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 24 Month(s)
Able to Provide Sponsorship

Skills

SLIs (Service Level Indicators)SLOs (Service Level Objectives)Error BudgetsToil ReductionAutomationIncident ManagementPostmortemsArchitectural Design:Proven experience in designing reliable
scalable
and high-performing solutions is crucial.Cloud Infrastructure & Technologies:

Job Details

Google Cloud Platform Site Reliability Engineer

Location: Atlanta, GA

Key Responsibilities & Focus Areas:

  • Google Cloud Platform Expertise: This role heavily emphasizes deep knowledge and hands-on experience with Google Cloud Platform services. Specific mentions include:

    • BigQuery

    • Cloud Logging

    • IAM (Identity and Access Management)

    • Service Accounts

    • Provisioning and monitoring cloud services (staging/production)

    • Deploying, maintaining, and troubleshooting cloud services

  • SRE Principles & Practice: A core component of this role is the practical application of Site Reliability Engineering principles:

    • SLIs (Service Level Indicators)

    • SLOs (Service Level Objectives)

    • Error Budgets

    • Toil Reduction

    • Automation

    • Incident Management

    • Postmortems

  • Architectural Design: Proven experience in designing reliable, scalable, and high-performing solutions is crucial.

  • Cloud Infrastructure & Technologies:

    • Comprehensive understanding of cloud computing platforms (Google Cloud Platform specifically), including infrastructure, networking, and security services.

    • Strong experience with containerization and orchestration (Kubernetes, Docker, serverless computing).

  • Observability: Designing and implementing robust observability solutions is a key skill, with experience in tools like:

    • Dynatrace

    • Prometheus

    • Grafana

    • ELK/EFK Stack (Elasticsearch, Logstash, Kibana/Fluentd)

  • Programming & Scripting: Strong skills in languages like Python, Go, and Bash are required for automation and tool development.

  • Problem-Solving & Leadership: The role demands excellent analytical, problem-solving, and strategic thinking skills, along with strong communication, collaboration, and leadership abilities to influence technical direction.

  • On-Call: Expect to be part of an on-call rotation.

Experience Required:

  • 6+ years in systems engineering, platform support, DevOps, or site reliability roles.

Overall:

This is a senior-level SRE position requiring a strong blend of hands-on technical expertise in Google Cloud Platform, a deep understanding of SRE methodologies, and architectural design capabilities. The ideal candidate will be proficient in automation, observability, and incident response, with a commitment to building highly reliable and scalable systems.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.