SRE Engineer

Chicago, IL, US • Posted 10 hours ago • Updated 10 hours ago

Contract Independent

Contract W2

On-site

$40 - $50/hr

Fitment

Dice Job Match Score™

⭐ Evaluating experience...

Job Details

Skills

Amazon Web Services
Collaboration
Documentation
Site Reliability Engineer
Cloud Computing

Summary

Site Reliability Engineer (SRE) Chicago, IL

Required Education:

Degree: Not required but highly preferred (top candidates will have a degree).

Experience: 2 4 years of experience is a hard requirement.

Technical Skills:

Experience supporting production grade, customer facing platforms in complex, multi-team environments.

A demonstrated ownership mindset, taking accountability for service stability, incident outcomes, and follow through beyond initial investigation.

Strong understanding of AWS Kinesis streaming and messaging services, containerized and serverless compute using Fargate and Lambda, and CI/CD pipeline implementation using Azure DevOps.

Experience utilizing ServiceNow for incident management and Azure DevOps for features, user stories, etc.

Proven ability to partner effectively with engineering, product, and platform teams to resolve issues and improve operational efficiency.

Experience driving root cause analysis and continuous improvement, turning incidents into long term reliability gains.

Strong understanding of operational readiness standards, including monitoring, alerting and runbooks.

Comfort operating in on-call or escalation roles, maintaining composure and clear communication during high impact incidents.

Ability to identify gaps in processes or tooling and proactively improve support models, documentation, or workflows.

Experience working within enterprise ITSM frameworks.

Soft Skills:

Strong communication skills, with the ability to translate technical issues into clear status and impact updates for stakeholders.

The resource will provide platform stability and production support for AWS/cloud based services, including on-call support, incident management, and operational readiness. This role is responsible for developing and maintaining runbooks, coordinating technical resolution across teams and driving continuous improvement, while serving as the primary technical liaison between Caterpillar platform teams, product partners and support organizations.
Key Responsibilities:

Own incident tickets through the full lifecycle, from initial triage to resolution and closure.

Collaborate with engineering, platform, product, and operations teams to diagnose issues and coordinate fixes.

Communicate incident status, impact, and resolution progress to stakeholders.

Lead or contribute to root cause analysis and ensure follow up actions are identified and tracked.

Ensure platform reliability through monitoring, alerting, security, and operational best practices.

Respond to and manage production incidents impacting AWS services and APIs.

Drive reliability, stability, and operational readiness improvements across cloud platforms.

Understand end-to-end technical and business flows to support production services effectively.

Develop, maintain, and improve clear, actionable runbooks for operational support.

Lead knowledge transfer sessions to ensure support teams are ready for production support.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10312902
Position Id: 8920593
Posted 10 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

ECM - SRE Principal Engineer

Chicago, Illinois

•

Today

Company Overview At Motorola Solutions, we believe that everything starts with our people. We're a global close-knit community, united by the relentless pursuit to help keep people safer everywhere. We build and connect technologies to help protect people, property and places. Our solutions foster the collaboration that's critical for safer communities, safer schools, safer hospitals, safer businesses, and ultimately, safer nations. Connect with a career that matters, and help us build a safer

Full-time

USD 120,000.00 - 140,000.00 per year

Principle SRE Engineer

Remote

•

4d ago

Role : Principle SRE Engineer Duration : 6+ Months Location : Dallas, TX(Remote) Looking for senior/principle-level SRE practitioner who has strong hands-on experience implementing reliability practices at scale. The type of profile that would be most valuable for us is someone who has personally driven the operationalization of SRE frameworks not just at a strategic level, but through execution. This would include areas such as: Defining and implementing SLIs/SLOs and reliabi

Easy Apply

Contract, Third Party

Depends on Experience

SRE Lead Platform Engineer- Remote

Remote

•

Today

Role Summary As a Lead SRE Platform Engineer, you will drive reliability engineering strategy and execution across critical IT Business Solutions platforms. This role focuses on improving uptime, performance, and operational efficiency through software enhancements, observability, automation, and data-driven root cause analysis (RCA). You will serve as the technical lead for SRE practices establishing monitoring standards, improving MELT (Metrics, Events, Logs, Traces) strategy, influencing tool

Contract

75-95/hr

Principal SRE Engineer

Remote

•

Today

Urgently lookingPrinciple SRE Engineer for an urgent job opportunity . Remote role Need Sr candidates Client: CVS Health Any visa fine Max 2-3 round- With CVS Health Key Responsibilitie Defining and implementing SLIs/SLOs and reliability targets that align with the departments Golden PathwaysBuilding and operationalizing observability standards (metrics, logs, traces)Designing/evolving existing incident management and RCA practicesDriving automation and reliability engineering workflowsEstablis

Easy Apply

Contract, Third Party

80 - 85

Search all similar jobs