Site Reliability Engineer

Hybrid in Chicago, IL, US • Posted 60+ days ago • Updated 22 days ago

Full Time

Hybrid

$150,000 - $155,000/yr

Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

Site Reliability
Linux
Kubernetes
Terraform
Docker
Jenkins
Ansible
Kafka
AWS

Summary

***Hybrid, 3 days onsite, 2 days remote***

***We are unable to sponsor as this is a permanent full-time role***

A prestigious company is looking for a Site Reliability Engineer. This role is focused on observation, logging, and capacity planning. This engineer will need experience/exposure to Linux systems, Kubernetes/Docker, Terraform, Jenkins, Ansible, Harness, and Kafka.

Responsibilities:

Collaborate with development, operations and infrastructure teams to ensure availability of services, and to work through implementation issues
Develop automation for incident response and to prevent problem recurrence
Create and enhance runbooks to respond to service outages or degradations
Assess the production readiness of services
Define and track operational metrics for production performance, reliability, scalability and availability
Architect, develop and maintain shared services and tools to improve reliability and reduce toil across the organization

Qualifications:

Bachelor s or Master s Degrees in Computer Science, Information Systems or other related field, or equivalent work experience
Minimum of 4+ years of experience in Site Reliability Engineering / DevOps
Experience with maintaining and troubleshooting large-scale distributed systems
Experience managing infrastructure in public cloud environments like AWS (preferred), Azure or Google Cloud Platform
Experience with AIOps and predictive analysis for anomaly detection, forecasting system capacity using monitoring and alerting tools like Splunk, AppDynamics, Datadog, StackDriver, Sysdig, Prometheus or Grafana
Programming/scripting experience in languages like Java, Bash, Python or Go
Experience with distributed messaging systems like Kafka, RabbitMQ, or ActiveMQ
Experience with container orchestration systems like Kubernetes, Mesos, Docker Swarm or Rancher
Experience with using Continuous Integration and Continuous Delivery (CI/CD) tools like Jenkins, Travis, Harness, Appveyor, CodeBuild or CodePipeline
Familiarity with leveraging large language models (LLMs) to automate and optimize SRE workflows. This may include using AI-powered tools to perform tasks such as, writing scripts, summarizing incident reports, or even creating and maintaining AI workloads.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: napil006
Position Id: 8808216
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Chicago, Illinois

•

5d ago

Exciting opportunity at one of the fastest growing financial services firms around the world. They offer prime brokerage, clearing and financing across traditional and digital assets, and are now looking to hire world-class engineers to help build on their success. Responsibilities will include: Automate infrastructure and operational workflows using IaC with Terraform and AWS CDK. Develop & optimize CI/CD pipelines to improve software delivery for large-scale distributed systems using Amazon Co

Full-time

Senior Site Reliability Engineer - Platform

Chicago, Illinois

•

Today

We are looking for a Site Reliability Engineer, to join our growing Platform Engineering team, who can cultivate our SRE philosophy, processes, and technologies from the ground up. This role entails driving standards and fostering adoption across our technology teams, whilst closely partnering with our DevOps and Cloud teams. With a hands-on approach, you'll work across both cloud and on-premises hosting platforms, ensuring the reliability and scalability of our trading systems and production en

Full-time

Site Reliability Engineer |||

Chicago, Illinois

•

Today

Job Summary The Site Reliability Engineer is a pivotal architect of stability for CME Clearing & Risk. You will engineer secure, scalable, and reliable technology solutions that safeguard the global marketplace. By bridging the gap between development and operations, you ensure our risk management services remain resilient and high-performing for customers worldwide. What You'll Get A supportive environment fostering career progression, continuous learning, and an inclusive culture.Broad expo

Full-time

USD 100,700.00 - 167,800.00 per year

Lead Site Reliability Engineer (SRE)

Illinois

•

18d ago

Lead Site Reliability Engineer (SRE) Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers, doers and disruptors, who love to solve real problems and meet real customer needs. As a Site Reliability Engineer (SRE), you'll tap into your passion for proactively finding and fixing inefficiencies to solv

Full-time

USD 149,800.00 - 171,000.00 per year

Search all similar jobs

Site Reliability Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs