Overview
Skills
Job Details
Key Responsibilities:
Design, build, and maintain resilient, scalable, and secure Java-based services.
Implement SRE best practices including SLIs/SLOs/SLAs, incident response, and blameless postmortems.
Automate infrastructure and CI/CD pipelines using tools like Jenkins, Terraform, Ansible, etc.
Build observability into services using monitoring/logging tools (Prometheus, Grafana, ELK, Splunk).
Perform root cause analysis of production incidents and lead incident resolution.
Collaborate with developers to optimize performance and reduce system latency.
Implement and improve alerting and automated remediation mechanisms.
Work with cloud environments like AWS/Google Cloud Platform/Azure for provisioning, scaling, and performance tuning.
Ensure compliance, security, and backup/recovery strategies for all supported systems.
Required Qualifications:
5+ years of experience as a Java Developer or Java-focused SRE.
Strong knowledge of Java, Spring Boot, REST APIs, and microservices architecture.
Experience working with Linux/Unix environments and scripting (Bash, Python).
Deep understanding of container orchestration (Docker, Kubernetes).
Solid experience with CI/CD tools: Jenkins, Git, Maven, Gradle.
Familiarity with infrastructure-as-code (Terraform, CloudFormation).
Experience with monitoring/logging/
alerting tools: Prometheus, Grafana, Splunk, ELK, Datadog. Cloud experience with AWS, Google Cloud Platform or Azure.
Familiar with incident management practices and 24/7 support rotations.