Site Reliability Engineer

Overview

On Site

$50 - $60

Contract - W2

Contract - Independent

Contract - 6 Month(s)

Skills

Dynatrace

Site Reliability Engineer (SRE

calable

reliable

AWS

Prometheus

Grafana

ELK

Jaeger)

Job Details

Hiring Now: Site Reliability Engineer (SRE) Core SRE Only | Atlanta, GA | Onsite

Location: Atlanta, GA

Note: Seeking only Core SRE professionals DevOps-only profiles will not be considered

Mandatory: Hands-on experience with Dynatrace

About the Role:

We re looking for an experienced Site Reliability Engineer (SRE) to join our client s team in Atlanta, GA. This role requires a strong background in Core SRE, not traditional DevOps, with expertise in Dynatrace and a deep understanding of building scalable, reliable, and observable systems on AWS.

Key Responsibilities:

Reliability Strategy & Observability:

Design scalable, secure, and cost-effective infrastructure on AWS
Define and implement SRE best practices, SLIs/SLOs, and Error Budgets
Identify and close observability gaps using Dynatrace, OpenTelemetry, etc.
Lead maturity improvements in monitoring and system health visibility

Platform Architecture & Automation:

Architect solutions that reduce operational toil through automation
Enhance CI/CD pipelines, IaC modules, and chaos engineering platforms
Research and recommend tools that improve reliability and efficiency

Technical Leadership:

Act as a technical advisor to development and platform teams
Ensure reliability principles are applied early in design ("shift-left")
Mentor engineers and lead production readiness assessments

Resilience & Incident Management:

Lead blameless postmortems and implement systemic improvements
Architect and enforce resilience patterns like circuit breakers and graceful degradation

Must-Have Qualifications:

Proven experience as an SRE Architect or similar leadership role
Strong hands-on with Dynatrace, AWS, and observability tooling (e.g., Prometheus, Grafana, ELK, Jaeger)
Deep knowledge of SLIs, SLOs, automation, incident response, and postmortems
Expertise in Kubernetes, Docker, and scripting (Python, Go, Bash)
Excellent communication and stakeholder management skills

Nice-to-Haves:

Experience implementing chaos engineering tools and practices
Exposure to serverless platforms and modern reliability frameworks

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share