Overview
Skills
Job Details
Senior Site Reliability Engineer (SRE)
Location: Chicago, IL (Onsite)
Type: Contract
Role Overview:
We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in AWS infrastructure, automation, observability, and production support. The ideal candidate will bring a blend of DevOps and SRE practices, ensuring our systems remain resilient, scalable, and cost-efficient. This role requires hands-on technical depth, proactive problem-solving, and the ability to embed reliability engineering across development teams.
Key Responsibilities:
-
Design, implement, and maintain secure, scalable, and highly available AWS infrastructure.
-
Build and enhance CI/CD pipelines and Infrastructure as Code (IaC) solutions using Terraform and Harness.
-
Implement and manage monitoring, logging, alerting, and distributed tracing with tools like Dynatrace and Datadog.
-
Troubleshoot production incidents, conduct blameless postmortems, and strengthen incident response processes.
-
Optimize systems for performance, cost efficiency, and reliability.
-
Drive chaos engineering and resilience testing initiatives.
-
Collaborate with developers to implement SLAs, SLOs, and error budgets.
-
Mentor junior SREs and promote DevOps/SRE best practices across the organization.
Required Skills & Experience:
-
8+ years of experience in DevOps/SRE roles with a strong focus on AWS.
-
Proven expertise in AWS services and infrastructure automation.
-
Strong hands-on experience with Terraform, Harness, or similar IaC/CICD tools.
-
Advanced knowledge of monitoring & observability platforms (Dynatrace, Datadog, Prometheus, Grafana, etc.).
-
Deep understanding of incident response, disaster recovery, and reliability frameworks.
-
Solid coding/scripting skills in Python, Bash, or similar languages.
-
Experience with chaos engineering, resilience testing, and fault tolerance design.
-
Strong collaboration, leadership, and mentoring capabilities.
Preferred Qualifications:
-
Familiarity with Kubernetes, Docker, and container orchestration.
-
Experience with FinOps practices (cloud cost optimization).
-
Background in distributed systems, scalability, and fault-tolerant architectures.