Senior Java SRE
Location: Santa Monica, CA - Onsite (Hybrid/Initial Remote options depending on end-client)
Duration: 12+ Months
Engagement Type: Contract
Experience Required: 14+ years
Key Responsibilities:
Architect globally distributed, multi-region Google Cloud Platform platforms with 99.99%+ availability targets.
Define and operationalize SLIs, SLOs, error budgets, and reliability governance models.
Lead incident command, RCA, and long-term reliability remediation for large-scale systems.
Engineer and tune Java-based microservices (JVM internals, strategies, memory profiling).
Design and operate GKE (Google Kubernetes Engine) at scale, including multi-cluster and fleet management.
Implement Google Cloud Platform-native architectures using:
o GKE, Compute Engine, Cloud Load Balancing
o Cloud Spanner, Bigtable, Cloud SQL
o Pub/Sub, Cloud Storage
o IAM, VPC Service Controls
Build secure and repeatable infrastructure using Terraform and policy-as-code.
Design advanced service mesh and traffic management using Istio / Anthos Service Mesh.
Implement stateful Kubernetes workloads using Portworx.
Implement advanced Kubernetes storage using Portworx for stateful workloads.
Support event-driven architectures using Kafka, Kafka Streams, KSQLDB, and Spark Streaming.
Integrate Google Cloud Platform-native streaming solutions such as Pub/Sub.
Optimize systems for low-latency, high-throughput workloads.
Implement advanced observability using Prometheus, Datadog, Splunk, Kiali.
Leverage eBPF for kernel-level tracing, networking diagnostics, and performance tuning.
Manage advanced ingress, load balancing, and traffic shaping using Nginx Controller and Seesaw.
Architect high-scale CI/CD pipelines using GitLab CI/CD, Jenkins, and Google Cloud Platform-native tooling.
Build internal developer platforms (PaaS) to standardize deployments and reduce toil.
Automate operations using Python, Go, Bash, and custom reliability tooling.
Provide 247 production support across U.S. time zones.
Participate in on-call rotations, weekend releases, and incident war rooms.
Continuously improve monitoring, alerting, and incident response maturity.
Required Technical Expertise:
Java (Advanced JVM internals, performance tuning)
Google Cloud Platform Cloud (Professional-level depth)
GKE/Kubernetes (CKA/CKS depth)
Docker, Terraform
CI/CD: GitLab CI/CD, Jenkins
Streaming: Kafka, Kafka Streams, KSQLDB, Spark
Service Mesh: Istio, Anthos Service Mesh
Monitoring & Logging: Prometheus, Datadog, Splunk, Kiali
OS & Scripting: Linux/Unix, Bash
Programming: Python or Go
Virtualization: VMware
Networking & Performance: eBPF, Nginx Controller, Seesaw
Multi-cluster Kubernetes governance
Internal platform engineering (PaaS)
High-traffic SaaS or consumer-scale platforms
Real-time streaming & event-driven architectures
Deep observability and kernel-level tracing
GKE fleet & Anthos multi-cluster architectures
JVM performance engineering at hyperscale
Service mesh traffic shaping & zero-downtime releases
eBPF-based observability & kernel tracing
Platform engineering / internal PaaS design
Real-time streaming & event-driven systems
Certifications Required:
Google Professional Cloud Architect or Professional Cloud DevOps Engineer
Certified Kubernetes Administrator (CKA) or Certified Kubernetes