Senior Site Reliability Engineer (SRE)

Overview

On Site
Full Time
Part Time
Accepts corp to corp applications
Contract - W2
Contract - Independent

Skills

Production Support
Conflict Resolution
Problem Solving
Reliability Engineering
Continuous Integration
Continuous Delivery
Management
Budget
DevOps
FOCUS
Amazon Web Services
Terraform
Continuous Integration and Development
Dynatrace
Grafana
Incident Management
Disaster Recovery
Scripting
Python
Bash
CHAOS
Testing
Collaboration
Leadership
Mentorship
Kubernetes
Docker
Orchestration
Cloud Computing
Optimization
Scalability

Job Details

Senior Site Reliability Engineer (SRE)

Location: Chicago, IL (Onsite)

Type: Contract

Role Overview:

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in AWS infrastructure, automation, observability, and production support. The ideal candidate will bring a blend of DevOps and SRE practices, ensuring our systems remain resilient, scalable, and cost-efficient. This role requires hands-on technical depth, proactive problem-solving, and the ability to embed reliability engineering across development teams.

Key Responsibilities:

  • Design, implement, and maintain secure, scalable, and highly available AWS infrastructure.

  • Build and enhance CI/CD pipelines and Infrastructure as Code (IaC) solutions using Terraform and Harness.

  • Implement and manage monitoring, logging, alerting, and distributed tracing with tools like Dynatrace and Datadog.

  • Troubleshoot production incidents, conduct blameless postmortems, and strengthen incident response processes.

  • Optimize systems for performance, cost efficiency, and reliability.

  • Drive chaos engineering and resilience testing initiatives.

  • Collaborate with developers to implement SLAs, SLOs, and error budgets.

  • Mentor junior SREs and promote DevOps/SRE best practices across the organization.


Required Skills & Experience:

  • 8+ years of experience in DevOps/SRE roles with a strong focus on AWS.

  • Proven expertise in AWS services and infrastructure automation.

  • Strong hands-on experience with Terraform, Harness, or similar IaC/CICD tools.

  • Advanced knowledge of monitoring & observability platforms (Dynatrace, Datadog, Prometheus, Grafana, etc.).

  • Deep understanding of incident response, disaster recovery, and reliability frameworks.

  • Solid coding/scripting skills in Python, Bash, or similar languages.

  • Experience with chaos engineering, resilience testing, and fault tolerance design.

  • Strong collaboration, leadership, and mentoring capabilities.


Preferred Qualifications:

  • Familiarity with Kubernetes, Docker, and container orchestration.

  • Experience with FinOps practices (cloud cost optimization).

  • Background in distributed systems, scalability, and fault-tolerant architectures.


Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Purple Drive Technologies LLC