Senior Site Reliability Engineer

Overview

Remote
Depends on Experience
Full Time

Skills

DevOps
Cloud Computing
Continuous Delivery
Problem Solving
Systems Engineering
Software Engineering

Job Details


Design, develop, and troubleshoot large-scale, distributed, event-driven cloud systems to ensure high availability and performance.
Coordinate and implement infrastructure and software improvements to meet resiliency and scalability goals.
Maintain and enhance infrastructure and monitoring-as-code to ensure repeatability, traceability, and transparency in automation.
Support on-call rotations, resolve operational issues, and drive long-term fixes to reduce alert fatigue.
Collaborate with development teams to design enterprise-grade solutions and uphold healthy DevSecOps practices including agile methodologies and CI/CD.
Participate in chaos testing and AWS ecosystem learning to proactively strengthen system reliability.
SRE, DevOps, or Software Engineering roles supporting enterprise applications.
Strong problem-solving, triage, and root cause analysis skills with a systems engineering mindset
Deep expertise in the AWS ecosystem, with hands-on experience across core services including primarily ECS, RDS, EKS, IAM, CloudWatch, and networking configurations.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.