Site Reliability Engineer - U.S. Citizen - This role sits within Optum Serves Technology Product organization

Overview

Remote
$100,000 - $130,000
Full Time

Skills

Site Reliability
SRE
Cloud Platform
Cloud Infrastructure
Observability
Azure
Kubernetes
Terraform
Pulumi
Helm
ArgoCD
Flux
IaC
CI/CD
Splunk
Dynatrace
Grafana
Prometheus

Job Details

Job Title: Site Reliability Engineer
Location: Headquarters / Telecommute
Classification (HR only): Exempt Non-Exempt
Reports To (Title): COO Widescope Consulting and Contracting

JOB SUMMARY

Primary Purpose of Position

Widescope Consulting and Contracting is proud to serve our nation's military and Veterans. We support federal agencies in advancing the United States health care system and improving the overall health and well-being of those who serve or have served our country. Our health services are designed to help people live healthier lives.

The Site Reliability Engineer will design, build, and manage OptumServe’s cloud environments across both commercial and government platforms. In this role, the SRE partners closely with software engineers, architects, and build/release teams to create and maintain a secure, resilient, and high-performance cloud infrastructure.

Key Responsibilities Include:

  • Build, maintain, and operate the Azure-hosted platform.
  • Collaborate with development teams to define, measure, and track SLOs, SLAs, and SLIs.
  • Contribute to the design and development of platform services, including architecture, provisioning, configuration, deployment, and ongoing support.
  • Integrate systems with centralized logging, metrics dashboards, instrumentation, and incident monitoring/management tools.
  • Lead initiatives focused on Observability, Automation, Infrastructure as Code, and related platform capabilities.
  • Participate in the on-call rotation to support incident response and platform stability.
  • Address production issues by implementing automation, self-healing capabilities, and real-time monitoring.
  • Maintain and enhance operational tooling, frameworks, and internal platform utilities.
  • Perform root-cause analysis and deliver fixes for tooling, automation, and platform failures.
  • Develop frameworks to test the performance, scalability, and resiliency of platform services and tools.
  • Build, integrate, and administer systems that provide engineering teams with autonomous production insights (dashboards, APM tools, etc.).
  • Automate alerts across performance, cost, vulnerabilities, risks, and compliance violations.
  • Improve processes and runbooks, championing automation of manual support activities.
  • Conduct postmortems to identify improvements and prevent recurrence of production issues.

Qualifications

Required:

  • At least 4 years of experience working within a SRE/DevOps role
  • Experience leveraging AI tools in the software development (or product) lifecycle in order to improve quality and efficiency
  • Expert knowledge of a cloud service provider, Azure preferred
  • Good knowledge of SRE principles
  • Strong awareness of networking and internet protocols
  • Understanding of identity and access management (IAM)
  • Experience with Docker and Kubernetes (Azure Kubernetes Service preferred) in production
  • Experience supporting infrastructure in production cloud environments
  • Experience with automated configuration management tools such as Chef, Puppet, or Ansible
  • Knowledge of Encryption, Public Key Infrastructure (PKI), understanding of OWASP
  • Experience working with RESTful services
  • Some Experience with Monitoring tools and technologies (Splunk, Dynatrace, new relic)
  • Ability to deploy and operate cloud-native applications in a public cloud (Azure preferred)
  • Ability to support software and/or cloud-infrastructure in an on-call rotation basis to help with identification and remediation of technical problems at the root cause
  • Strong track record of learning new tools and technologies
  • Detail and results-oriented, able to prioritize tasks
  • Demonstrated experience working in a fast pace, large scale Agile/Scrum development environment
  • Strong verbal and written communication skills
  • Proactive and self-motivated, passionate, team-player

Preferred:

  • Bachelor s Degree in Computer Science, Information Technology, Software Engineering, or a related field.

The statements herin are not intended to be all-inclusive of the duties and responsibilities of the position. Based on leadership decisions and business needs, all other duties as assigned will be expected for each position.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Widescope Consulting and Contracting Services