Site Reliability Engineer - U.S. Citizen - This role sits within Optum Serves Technology Product organization

Overview

Remote
$120,000 - $140,000
Full Time

Skills

Site Reliability
SRE
Cloud Platform
Cloud Infrastructure
Observability
Azure
Kubernetes
Terraform
Pulumi
Helm
ArgoCD
Flux
IaC
CI/CD
Splunk
Dynatrace
Grafana
Prometheus

Job Details

Job Title: Site Reliability Engineer
Location: Headquarters / Telecommute
Classification (HR only): Exempt Non-Exempt
Reports To (Title): COO Widescope Consulting and Contracting

JOB SUMMARY

Primary Purpose of Position

Widescope Consulting and Contracting is proud to serve our nation's military and Veterans. We support federal agencies in advancing the United States health care system and improving the overall health and well-being of those who serve or have served our country. Our health services are designed to help people live healthier lives.

The Site Reliability Engineer will architect, develop, and maintain secure, resilient, and high-performance cloud environments across commercial and government platforms. This role will collaborate closely with software engineers, architects, and build/release engineers to deliver scalable infrastructure solutions that meet enterprise reliability and automation standards.

Key Responsibilities

  • Build, maintain, and operate cloud-hosted platforms (Azure or other cloud providers).

  • Work closely with development teams to define, measure, and improve SLOs, SLAs, and SLIs.

  • Contribute to the architecture, provisioning, configuration, deployment, and support of platform services.

  • Integrate centralized logging, metrics dashboards, instrumentation, incident monitoring, and management tools.

  • Drive initiatives for Observability, Automation, and Infrastructure-as-Code (IaC).

  • Participate in on-call rotations to support platform incident resolution.

  • Implement automation, self-healing, and real-time monitoring to proactively address production issues.

  • Maintain and enhance operational tooling, frameworks, and performance testing systems.

  • Conduct root cause analysis and develop solutions for recurring issues in production systems.

  • Provide engineering teams with tools and dashboards for application monitoring and autonomy in production environments.

  • Automate alerts for performance, cost optimization, vulnerabilities, risk, and compliance violations.

  • Improve operational runbooks and champion automation for manual processes.

  • Lead postmortems for production incidents to identify and address underlying causes.


Qualifications

Required:

  • 4+ years of experience in a Site Reliability Engineer (SRE) or DevOps role.

  • Experience leveraging AI tools in software development or product lifecycles for efficiency and quality improvements.

  • Expert knowledge of a cloud service provider (Azure preferred).

  • Strong understanding of SRE principles and best practices.

  • Hands-on experience with Docker and Kubernetes (AKS or other Kubernetes platforms).

  • Familiarity with Infrastructure-as-Code tools such as Ansible, Chef, or Puppet.

  • Knowledge of networking fundamentals, IAM, and security best practices (e.g., PKI, OWASP).

  • Experience supporting production-grade cloud environments and services.

  • Proficiency with monitoring and observability tools (e.g., Splunk, Dynatrace, New Relic).

  • Familiarity with RESTful APIs and cloud-native application deployments.

  • Ability to participate in on-call rotations and perform incident root cause analysis.

  • Demonstrated success in Agile/Scrum development environments.

  • Strong problem-solving, communication, and collaboration skills.

  • Proven ability to adapt quickly to new tools and technologies.

Preferred:

  • Bachelor’s Degree in Computer Science, Information Technology, Software Engineering, or a related field.

The statements herin are not intended to be all-inclusive of the duties and responsibilities of the position. Based on leadership decisions and business needs, all other duties as assigned will be expected for each position.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Widescope Consulting and Contracting Services