Site Reliability Engineer (SRE)

Remote • Posted 1 hour ago • Updated 1 hour ago

Contract W2

Contract Independent

No Travel Required

Remote

$70 - $80/hr

Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

Continuous Delivery
Capacity Management
Cloud Computing
Amazon Web Services
Budget
CHAOS

Summary

Job Overview

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our engineering team and help ensure the reliability, scalability, and performance of our production systems. In this role, you will work closely with software engineers, cloud architects, and DevOps teams to build automated infrastructure solutions, improve system observability, and maintain highly available distributed systems.

The ideal candidate has a strong background in cloud infrastructure, distributed systems, automation, and monitoring tools, along with experience managing large-scale production environments.

This position is fully remote within the United States and offers the opportunity to work on modern cloud-native platforms and highly scalable applications.

Key Responsibilities

System Reliability & Performance

Ensure the availability, reliability, and performance of mission-critical production systems.
Monitor infrastructure, applications, and services using advanced observability and monitoring tools.
Analyze system performance metrics and implement improvements to reduce latency and downtime.
Perform capacity planning to support growing system demands.

Automation & Infrastructure

Develop and maintain automation tools to reduce manual operational tasks.
Build and manage Infrastructure as Code (IaC) using tools such as Terraform or CloudFormation.
Automate deployment processes and operational workflows.

Incident Management

Participate in on-call rotation to respond to system incidents and outages.
Conduct root cause analysis (RCA) and implement long-term solutions to prevent recurring issues.
Develop incident response playbooks and improve operational processes.

Monitoring & Observability

Implement and maintain monitoring systems such as Prometheus, Grafana, ELK Stack, or Datadog.
Create dashboards and alerts to ensure proactive monitoring of production environments.
Improve system visibility through logging, metrics, and tracing.

Collaboration & Engineering Support

Work closely with development teams to improve application reliability and deployment practices.
Assist engineering teams with production deployments and troubleshooting.
Advocate for SRE best practices, including error budgets, SLIs, and SLOs.

Required Qualifications

Bachelor’s degree in Computer Science, Software Engineering, or a related technical field (or equivalent practical experience).
5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
Strong experience with Linux-based systems administration.
Hands-on experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (Google Cloud Platform).
Strong scripting or programming experience in Python, Go, or Bash.
Experience managing containerized environments using Docker and Kubernetes.

Core Technical Skills

Kubernetes & container orchestration
Cloud platforms (AWS / Azure / Google Cloud Platform)
Infrastructure as Code (Terraform, CloudFormation)
Monitoring tools (Prometheus, Grafana, Datadog, New Relic)
Logging tools (ELK Stack, Splunk)
CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI)
Distributed systems architecture

Preferred / Nice-to-Have Skills

Experience with microservices architecture
Knowledge of service mesh technologies (Istio, Linkerd)
Experience implementing chaos engineering practices
Familiarity with security best practices in cloud infrastructure
Experience with high-availability and disaster recovery architectures

Work Environment

Fully remote work environment across the United States
Collaborative engineering culture with cross-functional teams
Opportunity to work on large-scale distributed cloud systems

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91172806
Position Id: 8914465
Posted 1 hour ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

DevOps Engineer

Remote

•

Today

Job OverviewWe are seeking a highly motivated DevOps Engineer to help build and maintain scalable infrastructure and automated deployment pipelines. In this role, you will collaborate with development and operations teams to streamline software delivery, enhance infrastructure reliability, and improve system performance. The ideal candidate will have experience with cloud infrastructure, containerization technologies, and CI/CD automation tools. This role involves working with modern DevOps prac

Contract

60 - 75

Cloud Solutions Architect

Remote

•

Today

Job SummaryThe Cloud Solutions Architect will design scalable cloud architectures, lead migration initiatives, and optimize infrastructure to support enterprise applications. ResponsibilitiesDesign cloud infrastructure solutions Lead cloud migration strategies Implement microservices and containerized environments Ensure cloud security and compliance Optimize cloud costs and performance Required SkillsAWS / Azure / Google Cloud Platform architecture Infrastructure as Code (Terraform, CloudFormat

Contract

80 - 90

MLOps Engineer

Remote

•

Today

MLOps EngineerLocation: Remote USAEmployment Type: FulltimeSalary: $145,000 $200,000 per year Job SummaryWe are looking for a skilled MLOps Engineer to manage the lifecycle of machine learning models from development to production. The candidate will design scalable infrastructure for model deployment, monitoring, and automation. ResponsibilitiesBuild and manage ML deployment pipelines Implement CI/CD pipelines for machine learning workflows Monitor model performance and manage model retrainin

Contract

80 - 100

Senior Data Engineer

Remote

•

Today

Job OverviewWe are seeking an experienced Data Engineer to design, build, and maintain scalable data infrastructure that supports analytics, machine learning, and business intelligence initiatives. The Data Engineer will work closely with data scientists, analysts, and software engineers to ensure reliable data pipelines and high-quality data availability across the organization. The ideal candidate will have strong experience in data pipeline development, big data processing frameworks, and clo

Contract

60 - 70

Search all similar jobs