Senior Site Reliability Engineer (SRE)

Overview

On Site

Full Time

Part Time

Accepts corp to corp applications

Contract - W2

Contract - Independent

Skills

Production Support

Conflict Resolution

Problem Solving

Reliability Engineering

Continuous Integration

Continuous Delivery

Management

Budget

DevOps

FOCUS

Amazon Web Services

Terraform

Continuous Integration and Development

Dynatrace

Grafana

Incident Management

Disaster Recovery

Scripting

Python

Bash

CHAOS

Testing

Collaboration

Leadership

Mentorship

Kubernetes

Docker

Orchestration

Cloud Computing

Optimization

Scalability

Job Details

Location: Chicago, IL (Onsite)

Type: Contract

Role Overview:

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in AWS infrastructure, automation, observability, and production support. The ideal candidate will bring a blend of DevOps and SRE practices, ensuring our systems remain resilient, scalable, and cost-efficient. This role requires hands-on technical depth, proactive problem-solving, and the ability to embed reliability engineering across development teams.

Key Responsibilities:

Design, implement, and maintain secure, scalable, and highly available AWS infrastructure.
Build and enhance CI/CD pipelines and Infrastructure as Code (IaC) solutions using Terraform and Harness.
Implement and manage monitoring, logging, alerting, and distributed tracing with tools like Dynatrace and Datadog.
Troubleshoot production incidents, conduct blameless postmortems, and strengthen incident response processes.
Optimize systems for performance, cost efficiency, and reliability.
Drive chaos engineering and resilience testing initiatives.
Collaborate with developers to implement SLAs, SLOs, and error budgets.
Mentor junior SREs and promote DevOps/SRE best practices across the organization.

Required Skills & Experience:

8+ years of experience in DevOps/SRE roles with a strong focus on AWS.
Proven expertise in AWS services and infrastructure automation.
Strong hands-on experience with Terraform, Harness, or similar IaC/CICD tools.
Advanced knowledge of monitoring & observability platforms (Dynatrace, Datadog, Prometheus, Grafana, etc.).
Deep understanding of incident response, disaster recovery, and reliability frameworks.
Solid coding/scripting skills in Python, Bash, or similar languages.
Experience with chaos engineering, resilience testing, and fault tolerance design.
Strong collaboration, leadership, and mentoring capabilities.

Preferred Qualifications:

Familiarity with Kubernetes, Docker, and container orchestration.
Experience with FinOps practices (cloud cost optimization).
Background in distributed systems, scalability, and fault-tolerant architectures.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Senior Site Reliability Engineer (SRE)

Job Details