SRE DevOps with Google Cloud Platform Cloud

Remote • Posted 15 hours ago • Updated 15 hours ago

Contract W2

Contract Independent

No Travel Required

Remote

$70 - $75/hr

Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

SRE
GCP
DevOps

Summary

Senior site reliability engineer - Google Cloud Platform

Remote

Top Skills Details

•Cloud: Google Cloud Platform expertise; comfort with cloud native - open to other cloud platform experience
•Resiliency & Chaos Engineering: Strong grasp of resiliency concepts; ability to design safe, hypothesis driven chaos experiments and interpret outcomes to harden systems.
•Automation & CI/CD: Proven ability to improve CI/CD (policy gates, test automation, canary/blue green); experience transitioning platforms (e.g., Jenkins → Harness).
•Load/Performance: Hands on expertise with k6 and/or SmartBear load tools; capacity modeling; performance bottleneck analysis; test in pipeline practices.
•IaC & Platform: Terraform module design/standards; Helm chart authoring/ops for Kubernetes; config as code for Akamai where feasible.
•Observability: Deep experience setting up APM/logs/metrics (AppDynamics, Splunk), building actionable alerts, and designing dashboards around SLOs/SLIs.
•Programming: Proficiency in Python and JavaScript; familiarity with Kotlin and Groovy (especially in CI/CD pipelines).

Description

SRE Modernization & Reliability Engineering
• Lead SRE modernization aligned with DevOps principles: reliability by design, automation first operations, and service ownership across build run lifecycles (tool agnostic mindset, strong principles).
• Define service level objectives/indicators (SLOs/SLIs) and error budgets; partner with product and engineering to balance feature velocity with reliability.
• Establish fault tolerance baselines before production: codify and validate redundancy, graceful degradation, and recovery characteristics in pre prod environments.

Chaos & Resiliency Engineering
• Build and run a structured chaos engineering program to continuously test resiliency in lower environments first, then in production with guardrails.
• Use Gremlin for experiment orchestration; define hypotheses, blast radius controls, and success criteria; expand with vetted open source tooling as appropriate.
• Translate findings into reliability backlogs and architectural improvements; drive blameless postmortems and preventive design patterns.

Observability & Alerting
• Mature end to end observability (app, infra, network, CDN) with proper, actionable alerting—reduce noise, tighten signal, and ensure runbook backed alerts.
• Implement and optimize AppDynamics (APM) and Splunk (logs, analytics) to deliver high fidelity telemetry, business level health indicators, and golden signals.
• Extend observability to our CDN (Akamai) for edge performance, cache health, and origin protection; integrate with runbooks and incident workflows. (Observability responsibilities consistent with senior SRE templates.)

Performance, Load, and Capacity
• Own load and performance testing strategy—why we test (resiliency goals), what we test (user journeys, critical paths), and how we test (shift left, pipeline driven).
• Operate and evolve tooling: k6, SmartBear (e.g., LoadNinja/ReadyAPI), and vetted third party services; embed tests in CI/CD; feed results to capacity planning.

Deployment Automation & CI/CD
• Automate deployments end to end; enforce progressive delivery, canaries, and blue/green patterns with automated rollback. (Aligned with standard SRE responsibilities.)
• Drive CI/CD process improvements, help migrate from Jenkins to Harness (under evaluation); standardize quality gates, policy as code, and reliability checks in pipelines.

Platform Engineering, IaC & Kubernetes
• Standardize Infrastructure as Code across clouds and platforms: Terraform modules, policy controls, and repeatable environments.
• Operationalize Helm charts for Kubernetes services, ensuring versioning, security baselines, and rollout strategies (canary/blue green).
• Partner on Akamai configuration as code—codify edge policies, cache/CDN rules, and security controls; version and promote through environments.

Tooling Evaluation & Gap Closing
• Continuously evaluate tools, identify gaps across reliability, observability, chaos, and performance; build the roadmap to mature our environment and close those gaps. (This aligns with senior SRE strategic planning expectations.)

Incident Response & Operations Excellence
• Participate in and help optimize the on call rotation; reduce MTTA/MTTR through better detection, automation, and runbooks.
• Run blameless postmortems; convert systemic issues into durable engineering fixes and platform improvements.

Our Environment (What You’ll Work With)
• Cloud: Google Cloud (Google Cloud Platform) in a more cloud native posture, including private connectivity patterns/VPC scoped services.
• CDN/Edge: Akamai (multi layer observability + config as code).
• Observability: AppDynamics (APM), Splunk (logs/analytics), with alert standards and runbooks.
• Chaos/Resiliency: Gremlin, plus curated open source tools where they add value.
• Performance/Load: k6, SmartBear, and select third party load services.
• CI/CD: Jenkins → Harness migration (in evaluation); progressive delivery patterns and automated rollbacks.
• IaC/Containers: Terraform, Helm (Kubernetes).
• Languages: Python, JavaScript; some Kotlin and Groovy in CI/CD contexts.

External Communities Job Description

Here\''s your chance to work for a leading global pizza company and contribute to several key initiatives for a Resiliency team.

Additional Skills Tags

cicd,jenkins,harnesses,splunk,Python,groovy,kotlin,javascript,Terraform,Kubernetes

Additional Skills & Qualifications

DevOps experience - won\''t be DevOps first, SRE first
Proactive mindset
Pre Prod and Post Prod environment support

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91134724
Position Id: 8899681
Posted 15 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

SRE DevOps with Google Cloud Platform Cloud

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs