Site Reliability Engineer, Consultant

Overview

On Site

USD 119,130.00 - 178,860.00 per year

Full Time

Skills

Software Engineering

System Administration

Cross-functional Team

IaaS

High Availability

Scalability

Stacks Blockchain

Recovery

Terraform

Root Cause Analysis

Performance Tuning

Network

Database

Capacity Management

Forecasting

Load Testing

Regulatory Compliance

Software Development Methodology

Software Architecture

Engineering Design

SAFE

Computer Science

Microsoft Azure

Amazon Web Services

Google Cloud Platform

Google Cloud

Virtual Machines

Computer Networking

Identity Management

Storage

Scripting

Python

Java

Bash

Windows PowerShell

Orchestration

Kubernetes

Docker

Red Hat Linux

Management

Grafana

New Relic

Dynatrace

Splunk

SolarWinds

Dashboard

Jenkins

GitHub

GitLab

Continuous Integration

Continuous Delivery

Configuration Management

Ansible

Progress Chef

Puppet

Artificial Intelligence

Incident Management

Optimization

Testing

CHAOS

Cloud Computing

Health Care

Innovation

Collaboration

FOCUS

PASS

Job Details

Job Description

Your Role

We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving.
You will be part of a cross-functional team responsible for designing, implementing, and maintaining reliable systems that support millions of requests daily. This position requires a deep understanding of distributed systems, cloud infrastructure, automation, and incident response.

Responsibilities

Your Work

In this role, you will

Reliability & Uptime: Design and maintain systems to achieve high availability (99.9%+), scalability, and resilience.
Monitoring & Observability: Build and improve monitoring stacks using tools like Prometheus, Grafana, Datadog, or New Relic.
Automation: Reduce manual toil by automating deployments, scaling, and recovery processes using IaC (Terraform or CloudFormation).
Incident Management: Lead and respond to production incidents, perform root-cause analysis, and drive postmortems and prevention strategies.
Performance Optimization: Identify system bottlenecks and improve performance across compute, network, and database layers.
Capacity Planning: Forecast growth, conduct load testing, and ensure services can handle future demand.
Security & Compliance: Implement best practices for infrastructure security, secrets management, and compliance requirements.
Collaboration: Partner with developers to embed reliability practices into the SDLC, CI/CD pipelines, and application architecture.
Chaos Engineering: Design and execute chaos testing experiments to proactively identify weaknesses in distributed systems and improve overall resilience.
Deployment Strategies: Implement and manage Blue/Green and Canary deployment methodologies to minimize risk and ensure safe, incremental rollouts of new features and updates.

Qualifications

Your Knowledge and Experience

Education & Experience

Requires a Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience); Master's degree a plus.
7+ years of experience in building, supporting, and improving production systems and infrastructure.

Cloud Platforms

Minimum 5 years of hands-on experience with Azure, AWS, or Google Cloud Platform.
Demonstrated expertise in virtual machines (VMs), containers, cloud networking, identity and access management (IAM), monitoring, storage, and serverless functions.
Comfortable deploying and managing cloud-native services and infrastructure.

Programming & Scripting

Proficiency in one or more languages such as Python, Go, Java, Bash, PowerShell, or similar.
Ability to write clean, maintainable code for automation and tooling.

Containerization & Orchestration

Experience working with Kubernetes, Docker, and tools like Helm or Red Hat OpenShift.
Familiarity with managing containerized applications in production environments.

Monitoring & Observability

Working knowledge of tools such as Prometheus, Grafana, Datadog, New Relic, ELK Stack, Dynatrace, Splunk, Big Panda, SolarWinds.
Ability to set up dashboards, alerts, and metrics to ensure system health and performance.

CI/CD & Configuration Management

Experience with CI/CD pipelines using tools like Jenkins, GitHub Actions, GitLab CI, Argo CD, Spinnaker.
Familiarity with configuration management tools such as Ansible, Chef, Puppet.

Automation & Emerging Technologies

Understanding of Agentic AI systems and automation frameworks for incident response and infrastructure optimization is a plus.
Interest in exploring intelligent automation to improve reliability and reduce manual toil.

Testing & Deployment Expertise

Experience with chaos engineering tools (e.g., Gremlin, Chaos Monkey) and methodologies.
Hands-on knowledge of Blue/Green and Canary deployment strategies in cloud-native environments.

#LI-EB1

About the Team

About Stellarus and the Ascendiun Family of Companies

Stellarus, launched in January 2025, is designed to scale innovative healthcare solutions that support customers in creating a health care experience deserving of their family, friends, and neighbors.

Stellarus is part of a family of organizations that is overseen by a nonprofit corporate entity named Ascendiun. The Ascendiun Family of Companies also includes Blue Shield of California and its subsidiary, Blue Shield of California Promise Health Plan and Altais, a clinical services company.

Stellarus' vision is to empower its customers to create a healthcare experience that is worthy of their family, friends, and neighbors. Stellarus' objective is to offer innovative, modern, scalable solutions that challenge the health care status quo. This very closely aligns with Blue Shield of California's vision by using innovation to improve quality, affordability, and experience for members.

To achieve our mission, we foster an environment where all employees can thrive and contribute fully to address the needs of the various communities we serve. We are committed to creating and maintaining a supportive workplace that upholds our values and advances our goals.

Our Values:

At Stellarus, our core values of agility, trust, drive, courage and service shape our approach to developing innovative product offerings.

Our Workplace Model:

At Stellarus and the Ascendiun Family of Companies, we believe in fostering a workplace environment that balances purposeful in-person collaboration with flexibility. As we continue to evolve our workplace model, our focus remains on creating spaces where our people can connect with purpose - whether working in the office or through a hybrid approach - by providing clear expectations while respecting the diverse needs of our workforce.

Two Ways of Working:

Hybrid (Default): Work from a business unit-approved office at least two (2) times per month (for roles below Director-level) or once per week (for Director-level roles and above).Exceptions:

o Member-facing and approved out-of-state roles remain remote.

o Employees living more than 50 miles from their assigned offices are expected to work with their managers on a plan for periodic office visits.

o For employees with medical conditions that may impact their ability to work in-office, we are committed to engaging in an interactive process and providing reasonable accommodations to ensure their work environment is conducive to their success and well-being.

On-Site: Work from a business unit-approved office an average of four (4) or more days a week.

Physical Requirements:

Office Environment - roles involving part to full time schedule in Office Environment. Based in our physical offices and work from home office/deskwork - Activity level: Sedentary, frequency most of work day.

Please click here for further physical requirement detail.

Equal Employment Opportunity:

External hires must pass a background check/drug screen. Qualified applicants with arrest records and/or conviction records will be considered for employment in a manner consistent with Federal, State and local laws, including but not limited to the San Francisco Fair Chance Ordinance. All qualified applicants will receive consideration for employment without regards to race, color, religion, sex, national origin, sexual orientation, gender identity, protected veteran status or disability status and any other classification protected by Federal, State and local laws.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share