Site Reliability Engineer (SRE)

Overview

Remote

Depends on Experience

Contract - W2

Skills

Amazon Web Services

Bash

Computer Science

CHAOS

Cloud Computing

Continuous Integration

Docker

Google Cloud Platform

Linux

High Availability

Incident Management

Git

Computer Networking

GitLab

Microservices

Reliability Engineering

Collaboration

Dragon NaturallySpeaking

Job Details

Position: Site Reliability Engineer (SRE)
Experience: 9+ years

About the Role

We are looking for a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will bridge the gap between development and operations, ensuring our systems are scalable, reliable, and secure. You will be responsible for designing, automating, and monitoring critical infrastructure, improving application performance of our services.

Key Responsibilities

Build, maintain, and scale cloud infrastructure (AWS/Azure/Google Cloud Platform) with high availability and resilience.
Implement automation and Infrastructure-as-Code (IaC) using tools like Terraform, Ansible, or CloudFormation.
Monitor system performance, availability, and reliability using Prometheus, Grafana, ELK, Splunk, or Datadog.
Develop CI/CD pipelines (Jenkins, GitHub Actions, Azure DevOps, GitLab CI).
Manage incident response, on-call rotations, root cause analysis (RCA), and postmortems.
Optimize system reliability, latency, and scalability across distributed systems.
Ensure security, compliance, and disaster recovery strategies are in place.
Collaborate with DevOps, Developers, and QA teams to ensure efficient release cycles.
Drive SLOs, SLIs, and SLAs definition and implementation to measure and improve service health.
Troubleshoot production issues across services and infrastructure.

Required Skills & Qualifications

Bachelor s degree in Computer Science, Engineering, or equivalent experience.
9+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.
Strong expertise in Linux/Unix administration and scripting (Python, Bash, Go, or Shell).
Hands-on experience with Kubernetes, Docker, and microservices architectures.
Proficiency in cloud platforms (AWS, Azure, or Google Cloud Platform).
Experience with observability and monitoring tools (Prometheus, Grafana, ELK, New Relic, Datadog).
Familiarity with networking, DNS, load balancers, and CDN technologies.
Strong understanding of CI/CD pipelines and Git-based workflows.
Experience with incident management, chaos engineering, and resilience testing.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About the Role

Key Responsibilities

Required Skills & Qualifications

Share