Senior Java SRE_ Santa Monica, CA_ Hybrid
Location Santa Monica, CA Onsite (Hybrid/Initial Remote options depending on end-client)
Experience Required: 14+ years
Assignment Duration: 12+ Months
Engagement Type: Contract - W2
• 14+ years of experience with related tools and technologies.
• Experience with below skillsets:
o Java (Advanced JVM internals, performance tuning)
o Google Cloud Platform Cloud (Professional-level depth)
o GKE/Kubernetes (CKA/CKS depth)
o Docker, Terraform
o CI/CD: GitLab CI/CD, Jenkins
o Streaming: Kafka, Kafka Streams, KSQLDB, Spark
o Service Mesh: Istio, Anthos Service Mesh
o Monitoring & Logging: Prometheus, Datadog, Splunk, Kiali
o OS & Scripting: Linux/Unix, Bash
o Programming: Python or Go
o Virtualization: VMware
o Networking & Performance: eBPF, Nginx Controller, Seesaw
o Multi-cluster Kubernetes governance
o Internal platform engineering (PaaS)
o High-traffic SaaS or consumer-scale platforms
o Real-time streaming & event-driven architectures
o Deep observability and kernel-level tracing
o GKE fleet & Anthos multi-cluster architectures
o JVM performance engineering at hyperscale
o Service mesh traffic shaping & zero-downtime releases
o eBPF-based observability & kernel tracing
o Platform engineering / internal PaaS design
o Real-time streaming & event-driven systems
• Certifications below are mandatory:
o Google Professional Cloud Architect (Google Cloud Platform)
o Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)
Responsibilities:
• Architect globally distributed, multi-region Google Cloud Platform platforms with 99.99%+ availability targets.
• Define and operationalize SLIs, SLOs, error budgets, and reliability governance models.
• Lead incident command, RCA, and long-term reliability remediation for large-scale systems.
• Engineer and tune Java-based microservices (JVM internals, strategies, memory profiling).
• Design and operate GKE (Google Kubernetes Engine) at scale, including multi-cluster and fleet management.
• Implement Google Cloud Platform-native architectures using:
o GKE, Compute Engine, Cloud Load Balancing
o Cloud Spanner, Bigtable, Cloud SQL
o Pub/Sub, Cloud Storage
o IAM, VPC Service Controls
• Build secure and repeatable infrastructure using Terraform and policy-as-code.
• Design advanced service mesh and traffic management using Istio / Anthos Service Mesh.
• Implement stateful Kubernetes workloads using Portworx.
• Implement advanced Kubernetes storage using Portworx for stateful workloads.
• Support event-driven architectures using Kafka, Kafka Streams, KSQLDB, and Spark Streaming.
• Integrate Google Cloud Platform-native streaming solutions such as Pub/Sub.
• Optimize systems for low-latency, high-throughput workloads.
• Implement advanced observability using Prometheus, Datadog, Splunk, Kiali.
• Leverage eBPF for kernel-level tracing, networking diagnostics, and performance tuning.
• Manage advanced ingress, load balancing, and traffic shaping using Nginx Controller and Seesaw.
• Architect high-scale CI/CD pipelines using GitLab CI/CD, Jenkins, and Google Cloud Platform-native tooling.
• Build internal developer platforms (PaaS) to standardize deployments and reduce toil.
• Automate operations using Python, Go, Bash, and custom reliability tooling.
• Provide 24×7 production support across U.S. time zones.
• Participate in on-call rotations, weekend releases, and incident war rooms.
• Continuously improve monitoring, alerting, and incident response maturity.