Site Reliability Engineer (SRE) with Java Experience

Overview

On Site

Hybrid

Depends on Experience

Accepts corp to corp applications

Contract - W2

Contract - 12 Month(s)

Skills

Amazon Web Services

Cloud Computing

Collaboration

Continuous Delivery

Google Cloud Platform

Microsoft Azure

Splunk

Scripting

Shell

Performance Tuning

Kubernetes

Jenkins

Java

Docker

Python

Terraform

Spring Framework

Job Details

California Local Candidates are highly preferred.

Key Responsibilities:

Design, build, and maintain scalable, resilient, and highly available production systems.
Develop automation solutions using Java, Python, or scripting languages to reduce manual operations.
Implement and manage monitoring, logging, and alerting frameworks for proactive issue detection.
Collaborate with development teams to ensure reliability, performance, and scalability in application design.
Troubleshoot production issues, perform root cause analysis, and implement long-term fixes.
Define and measure SLIs, SLOs, and SLAs to ensure system reliability goals.
Optimize system performance and capacity planning across distributed environments.
Support CI/CD pipelines and ensure smooth deployments with minimal downtime.
Champion best practices for incident management, post-mortem analysis, and continuous improvement.
Work with cloud-native platforms (AWS, Azure, or Google Cloud Platform) to build resilient infrastructure.

Required Skills and Qualifications:

Bachelor s or Master s degree in Computer Science, Engineering, or related field.
Strong hands-on coding experience in Java (Spring Boot, Microservices).
Proficiency in scripting languages (Python, Shell, or Go preferred).
Solid understanding of cloud platforms (AWS, Google Cloud Platform, or Azure).
Expertise in Kubernetes, Docker, and container orchestration.
Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, or similar).
Strong knowledge of monitoring and observability tools (Prometheus, Grafana, Splunk, ELK, Datadog, New Relic).
Experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or CloudFormation.
Solid understanding of networking, distributed systems, and performance optimization.
Strong problem-solving, analytical, and troubleshooting skills.
Excellent communication and collaboration skills to work with cross-functional teams.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share