Lead Site Reliability Engineering (SRE) for google cloud -based restaurant workloads, defining SLIs, SLOs, error budgets, and ensuring reliability across 24,000+ distributed locations.
Architect and manage a resilient Google Distributed Cloud Edge (GDCE) platform designed for low-bandwidth and intermittent connectivity environments.
Implement self-healing Kubernetes infrastructure with GitOps-driven deployments, canary releases, and blue-green upgrade strategies to ensure zero downtime.
Establish full-stack observability using Google Cloud Operations Suite (Cloud Monitoring, Logging, Prometheus, Trace) with centralized dashboards and multi-level alerting.
Drive automation for cluster provisioning, edge node onboarding, configuration sync, and fleet-wide consistency using Terraform and Config Sync.
Design and operationalize L1.5/L2/L3 support models, incident management workflows, and KPIs including MTTA, MTTR, and uptime targets.
Oversee release certification pipelines with automated validation, conformance checks, and rollback mechanisms to ensure platform stability.
Collaborate with global operations and engineering teams to deliver a scalable, compliant, and highly available distributed edge platform.
Chicago, Illinois
•
12d ago
Splunk SOAR Developer Location: Chicago, IL 60661 OR Denver, CO Work Model: 100% Onsite (No Remote) Duration: 12+ Month Contract Interview Process: WebEx Interview + Onsite Interview Required Industry: Financial Services Job Overview We are seeking an experienced Splunk SOAR Developer to design, develop, and scale security automations in a high-availability enterprise SOC environment. The ideal candidate will have strong hands-on experience with Splunk SOAR (Phantom), advanced Python development
Easy Apply
Contract
Depends on Experience




