Job Title: Senior SRE (Site Reliability Engineer) with Java
Experience Required: 15+ years
Assignment Duration: 12+ Months
Engagement Type: Contract (C2C or 1099) or Full-time (W2)
Work Location: McLean, VA - Onsite
Key Responsibilities:
Design, build, and operate highly available, fault-tolerant systems supporting core banking, payments, and trading platforms.
Lead SRE practices including SLIs, SLOs, error budgets, and reliability-driven engineering decisions.
Provide L3/L4 incident response, root cause analysis (RCA), and post- incident remediation for production systems.
Support and optimize Java-based microservices running on Kubernetes (EKS).
Implement and manage AWS-native services (EC2, EKS, RDS, DynamoDB, S3, IAM, CloudWatch).
Develop automation using Terraform for infrastructure provisioning and policy enforcement.
Manage Kubernetes networking, storage, and service mesh integrations including Istio / Anthos Service Mesh.
Implement advanced Kubernetes storage solutions using Portworx.
Architect and maintain enterprise-grade CI/CD pipelines using GitLab CI/CD, Jenkins, and cloud-native tooling.
Automate manual operational tasks using Python, Go, Bash, and infrastructure-as-code patterns.
Implement monitoring, logging, and alerting using Prometheus, Datadog, Splunk, Kiali, and custom dashboards.
Utilize eBPF for deep kernel-level observability and performance tuning.
Support real-time data platforms using Kafka, KSQLDB, Kafka Streams, Spark Streaming.
Manage multi-cluster Kubernetes environments, including cluster federation.
Optimize system performance, scalability, and latency under high transaction volumes.
Enforce banking-grade security controls, IAM policies, secrets management, and least-privilege access.
Support environments aligned with SOC2, PCI-DSS, SOX, and internal banking security standards.
Provide 24 7 operational support, including rotational shifts, weekends, and on-call coverage across all U.S. time zones.
Required Technical Expertise:
Java (JVM internals, tuning, microservices)
AWS Cloud (EKS, EC2, IAM, VPC, RDS, CloudWatch)
Kubernetes (CKA/CKS-level depth)
Docker, Terraform
CI/CD: GitLab CI/CD, Jenkins
Streaming: Kafka, KSQLDB, Spark Streaming
Service Mesh: Istio, Anthos Service Mesh
Monitoring: Prometheus, Datadog, Splunk, Kiali
OS & Scripting: Linux/Unix, Bash
Programming: Python or Go
Virtualization: VMware
Networking & Performance: Nginx Controller, Seesaw, eBPF
Experience supporting core banking, payment gateways, or trading platforms
Exposure to high-frequency transaction systems
Knowledge of regulatory audits and compliance controls
Experience with zero-downtime deployments and disaster recovery strategies
Certifications Required:
AWS Certified Solutions Architect Professional or AWS DevOps Engineer Professional
Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)