Senior Java SRE
Work Location: Santa Monica, CA / Hybrid
Duration: 1+ Year
Engagement Type: Contract
Experience Required: 14+ years
Define and operationalize SLIs, SLOs, error budgets, and reliability
governance models.
Lead incident command, RCA, and long-term reliability remediation for
large-scale systems.
Engineer and tune Java-based microservices (JVM internals, strategies,
memory profiling).
Design and operate GKE (Google Kubernetes Engine) at scale, including
multi-cluster and fleet management.
Implement Google Cloud Platform-native architectures using:
o GKE, Compute Engine, Cloud Load Balancing
o Cloud Spanner, Bigtable, Cloud SQL
o Pub/Sub, Cloud Storage
o IAM, VPC Service Controls
Implement stateful Kubernetes workloads using Portworx.
Implement advanced Kubernetes storage using Portworx for stateful
workloads.
Support event-driven architectures using Kafka, Kafka Streams, KSQLDB,
and Spark Streaming.
Integrate Google Cloud Platform-native streaming solutions such as Pub/Sub.
Optimize systems for low-latency, high-throughput workloads.
Implement advanced observability using Prometheus, Datadog, Splunk,
Kiali.
Leverage eBPF for kernel-level tracing, networking diagnostics, and
performance tuning.
Manage advanced ingress, load balancing, and traffic shaping using Nginx
Controller and Seesaw.
Architect high-scale CI/CD pipelines using GitLab CI/CD, Jenkins, and Google Cloud Platform-
native tooling.
Build internal developer platforms (PaaS) to standardize deployments and
reduce toil.
Automate operations using Python, Go, Bash, and custom reliability
tooling.
Provide 247 production support across U.S. time zones.
Participate in on-call rotations, weekend releases, and incident war rooms
Required Technical
Expertise:
Java (Advanced JVM internals, performance tuning)
Google Cloud Platform Cloud (Professional-level depth)
GKE/Kubernetes (CKA/CKS depth)
Docker, Terraform
CI/CD: GitLab CI/CD, Jenkins
Streaming: Kafka, Kafka Streams, KSQLDB, Spark
Service Mesh: Istio, Anthos Service Mesh
Monitoring & Logging: Prometheus, Datadog, Splunk, Kiali
OS & Scripting: Linux/Unix, Bash
Programming: Python or Go
Virtualization: VMware
Networking & Performance: eBPF, Nginx Controller, Seesaw
Multi-cluster Kubernetes governance
Internal platform engineering (PaaS)
High-traffic SaaS or consumer-scale platforms
Real-time streaming & event-driven architectures
Deep observability and kernel-level tracing
GKE fleet & Anthos multi-cluster architectures
JVM performance engineering at hyperscale
Service mesh traffic shaping & zero-downtime releases
eBPF-based observability & kernel tracing
Platform engineering / internal PaaS design
Real-time streaming & event-driven systems
Google Professional Cloud Architect or Professional Cloud DevOps Engineer
Certified Kubernetes Administrator (CKA) or Certified Kubernetes