Site Reliability Engineer

Hybrid in Laurel, MD, US • Posted 30+ days ago • Updated 9 days ago
Contract W2
No Travel Required
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

  • Amazon Web Services
  • Bash
  • Google Cloud Platform
  • Docker
  • DevOps
  • IaaS
  • Python
  • SAFE
  • Terraform
  • Performance Tuning
  • Management
  • Kubernetes
  • Load Testing
  • Linux
  • Microsoft Azure
  • Reliability Engineering
  • Regulatory Compliance
  • Scalability
  • Unix
  • System Security

Summary

Job Description

Position: Site Reliability Engineer (SRE)

Role Summary

We are looking for a skilled Site Reliability Engineer (SRE) to ensure the reliability, availability, performance, and scalability of critical systems. The SRE will work closely with development and operations teams to build resilient infrastructure, automate operations, and improve system observability while maintaining strong SLAs/SLOs.


Key Responsibilities

• Design, build, and maintain highly available, scalable, and reliable systems.

• Define and manage SLIs, SLOs, and SLAs to ensure system reliability and performance.

• Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform, CloudFormation).

• Implement and manage CI/CD pipelines to enable safe and frequent deployments.

• Monitor system health using tools like Prometheus, Grafana, Datadog, Splunk, ELK.

• Handle incident response, on-call rotations, root cause analysis (RCA), and post-mortems.

• Improve system resilience through capacity planning, load testing, and chaos engineering.

• Collaborate with engineering teams to improve application reliability and reduce operational toil.

• Manage cloud infrastructure on AWS / Azure / Google Cloud Platform.

• Ensure system security, compliance, and best practices are followed.

• Support production deployments, upgrades, and performance tuning.


Required Skills & Experience

• 3+ years of experience as an SRE / DevOps / Production Engineer.

• Strong knowledge of Linux/Unix systems and networking fundamentals.

• Proficiency in scripting or programming (Python, Go, Bash).

• Experience with containers and orchestration (Docker, Kubernetes).

• Hands-on experience with monitoring, logging, and alerting tools.

• Strong understanding of cloud platforms (AWS, Azure, or Google Cloud Platform).

• Experience implementing high availability, fault tolerance, and disaster recovery strategies.

• Excellent problem-solving and troubleshooting skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10477291
  • Position Id: 8870172
  • Posted 30+ days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Bethesda, Maryland

Today

Full-time

USD 130,900.00 - 171,700.00 per year

Hybrid in McLean, Virginia

Today

Full-time

USD 77,600.00 - 176,000.00 per year

McLean, Virginia

Today

Full-time

USD 147,400.00 per year

Vienna, Virginia

Today

Full-time

USD 82,160.00 - 138,320.00 per year

Search all similar jobs