Overview
Skills
Job Details
Experience: 5+ years in SRE/DevOps with proven JVM, APIGEE, Google Cloud Platform observability, Grafana stack, GKE, OpenTelemetry, and UI instrumentation implementation experience
Clear Skills Needed:
Technical: Python, Linux, Prometheus, Grafana, Kubernetes, Docker, Loki, Tempo
JVM Metrics: Java application monitoring, JVM performance tuning, heap analysis, garbage collection optimization for portal applications
Logging & Tracing: Splunk, distributed tracing, log aggregation standards, correlation IDs across portal systems
API Management: APIGEE experience, API monitoring, rate limiting, security, performance tracking for portal APIs
Infrastructure: CI/CD pipelines , AI tools like GIT copilot , Cursor etc.
Observability Tools & Query Languages: PromQL, InfluxQL for querying metrics(Grafana)
Strong experience with Kubernetes (GKE), including namespace management, RBAC, and deploying/maintaining SRE tools via code (Python, Bash, YAML, Helm).