Staff Platform Engineer – DevOps / SRE (GKE & Cloud Infrastructure)

Remote • Posted 2 hours ago • Updated 2 hours ago
Contract W2
Contract Corp To Corp
Contract Independent
No Travel Required
Remote
$60 - $65/hr
Company Branding Image
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • CHAOS
  • Continuous Delivery
  • Database
  • DevOps
  • DevSecOps
  • Disaster Recovery
  • Good Clinical Practice
  • Google Cloud
  • Elasticsearch
  • Google Cloud Platform
  • Grafana
  • HIPAA
  • Health Care
  • Kubernetes
  • Linux
  • Medicare
  • Migration
  • Incident Management
  • Performance Tuning
  • Terraform
  • Robotic Process Automation
  • Testing
  • RPO

Summary

Staff Platform Engineer – DevOps / SRE (GKE & Cloud Infrastructure)

Role Overview

Senior individual contributor responsible for the cloud infrastructure, deployment automation, observability, and operational reliability of the Temporal-based claims processing platform on Google Kubernetes Engine. Owns the cluster topology, Terraform-managed infrastructure, hybrid networking to on-prem systems, secrets management, and the metrics/logs/traces observability stack. Serves as a technical authority for cloud platform reliability, capacity planning, and incident response.

This role supports a strategic platform initiative within Medicare Claims Engineering to migrate the existing Automation Anywhere RPA portfolio onto a modern, code-and-config-driven workflow platform built on Temporal.io, Python/Playwright, and Google Kubernetes Engine (GKE). Workflows are visually authored on a custom React Flow canvas that emits versioned configs executed by Temporal workers. The platform operates under HIPAA governance.

Key Responsibilities

  • Own the GKE cluster architecture: regional private cluster, autoscaling node pools, network policies, Pod Disruption Budgets, and ingress configuration.
  • Design hybrid networking from Google Cloud Platform to on-prem systems, including Cloud VPN/Interconnect strategy, VPC peering for Cloud SQL, and DNS resolution patterns.
  • Lead architectural decisions for resiliency, cost efficiency, and capacity, including node sizing, autoscaling on custom metrics, and committed-use discount strategy.
  • Champion Infrastructure as Code (Terraform) and CI/CD pipelines for containerized workloads, including image scanning, signing, and progressive rollout.
  • Own secrets management and runtime resolution of auth profiles for downstream systems, integrating with the CVS-approved secrets backend.
  • Operate the observability stack end-to-end: Managed Prometheus for metrics, Grafana dashboards, OpenTelemetry tracing to Cloud Trace, and structured logging to Cloud Logging.
  • Define and operate the SRE practice: SLIs/SLOs, error budgets, on-call rotations, incident response runbooks, post-mortems, and resilience testing.
  • Partner with Security on HIPAA-aligned controls: private cluster configuration, internal-only load balancers, IAP for internal applications, and audit logging.
  • Mentor senior and mid-level engineers on cloud-native operations and SRE discipline; lead design and code reviews for infrastructure changes; influence engineering direction across teams.

Required Qualifications

  • Multiple years of experience in DevOps, Site Reliability Engineering, or cloud platform engineering for production systems.
  • Deep production expertise with Kubernetes (GKE strongly preferred): node pool design, autoscaling, network policies, Helm, and workload identity.
  • Strong production experience on Google Cloud Platform, including GKE, Cloud SQL, VPC and hybrid connectivity, Cloud Logging, Cloud Trace, Managed Prometheus, and IAM.
  • Hands-on expertise with Infrastructure as Code (Terraform required; Helm required) and CI/CD pipelines for containerized workloads.
  • Strong understanding of high-availability architectures, multi-zone failover, disaster recovery, and RTO/RPO planning.
  • Proven experience operating large-scale, mission-critical production environments under regulatory or compliance constraints.
  • Advanced troubleshooting and performance optimization across Kubernetes, Linux, networking, and database layers.
  • Experience leveraging code generation tools like Copilot to write robust test cases and rapidly prototype features.
  • Experience collaborating across architecture, security, networking, and application teams.

Preferred Qualifications

  • Experience operating Temporal.io, Cadence, or comparable distributed workflow systems in production.
  • Hands-on experience operating PostgreSQL at scale (Cloud SQL HA, tuning, backup/PITR, schema migrations).
  • Experience with hybrid cloud connectivity to on-prem enterprise systems via Cloud VPN or Dedicated/Partner Interconnect.
  • Familiarity with Elasticsearch operations (self-managed or Elastic Cloud) for visibility/search workloads.
  • Familiarity with DevSecOps, SRE, and AIOps practices, including chaos engineering and resilience testing.
  • Healthcare, regulated industry, or large enterprise experience; familiarity with HIPAA/PHI controls and audit retention requirements.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10528602
  • Position Id: 8966417
  • Posted 2 hours ago

Company Info

About Preferred Staffing & Recruiting, LLC

PS&R is a women-owned, owner-operated agency providing permanent and temporary staffing solutions in administration, engineering, accounting and finance, human resources, sales and marketing, customer service, and information technology. We serve Boston and the surrounding metro area. Our clients are universities, hospitals, banks, engineering and IT firms, and small to midsize businesses that range from start up to well-established.

About_Company_OneAbout_Company_Two
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Contract, Third Party

60 - 65

Remote

Today

Easy Apply

Third Party, Contract

60 - 65

Search all similar jobs