Overview
Skills
Job Details
Job Title: Sr. Site Reliability Engineer
Location: Headquarters / Telecommute
Classification (HR only): Exempt Non-Exempt
Reports To (Title): COO Widescope Consulting and Contracting
JOB SUMMARY
< data-start="213" data-end="249">Primary Purpose of Position</>Widescope Consulting and Contracting is proud to serve our nation's military and Veterans. We support federal agencies in advancing the United States health care system and improving the overall health and well-being of those who serve or have served our country. Our health services are designed to help people live healthier lives.
The Sr. Site Reliability Engineer will architect, develop, and maintain secure, resilient, and high-performance cloud environments across commercial and government platforms. This role will collaborate closely with software engineers, architects, and DevOps teams to ensure seamless, scalable, and automated infrastructure solutions.
< data-start="592" data-end="628">Essential Functions Include</>Build, maintain, and operate IaaS and PaaS infrastructure in Azure commercial and government clouds.
Collaborate with development teams to define, measure, and improve SLOs, SLAs, and SLIs.
Contribute to the architecture, provisioning, configuration, deployment, and support of platform services.
Integrate systems with centralized logging, metrics dashboards, instrumentation, incident monitoring, and management tools.
Develop and administer tools that allow engineering teams to monitor applications in production environments autonomously (e.g., Dashboards, APMs).
Provide support for software and/or cloud infrastructure on a rotational on-call basis.
Identify and remediate technical issues by implementing automation, self-healing capabilities, and real-time monitoring.
Maintain and enhance operational tooling, frameworks, and testing infrastructure for performance and resiliency.
Automate alerts for performance, cost optimization, security vulnerabilities, risk, and compliance.
Champion process improvements and the automation of repetitive manual tasks.
< data-start="1743" data-end="1770">Job Qualifications</>
Requirements:
4+ years of experience in a Site Reliability Engineer or cloud platform engineering role.
Experience leveraging AI tools in the software development or product lifecycle to improve efficiency and quality.
Expertise in at least one major cloud service provider.
Hands-on production experience with Kubernetes cluster setup and management (bare metal or managed).
Proficiency with infrastructure-as-code (IaC) tools such as Terraform or Pulumi.
Experience with Kubernetes deployment tools like Helm, ArgoCD, or Flux.
Strong understanding of networking and internet protocols.
Knowledge of identity and access management (IAM) principles.
Experience supporting production cloud environments.
Familiarity with encryption, PKI, and OWASP security practices.
Experience working with RESTful services.
Some experience with monitoring tools such as Azure Monitor, Splunk, Dynatrace, Grafana, or Prometheus.
Familiarity with IDEs and version control tools like Visual Studio Code and Git.
Preferences:
Bachelor’s degree in Computer Science, Information Technology, Software Engineering, Math, or a related field.
Master’s degree with coursework in advanced algorithms, mathematics in computing, or data structures.
Expert knowledge of Azure.
Passion for infrastructure automation and process improvement.
Ability to prioritize effectively in a fast-paced environment.
The statements herein are not intended to be all-inclusive of the duties and responsibilities of the position. Based on leadership decisions and business needs, all other duties as assigned will be expected for each position.