SRE Incident Commander & Operations Lead

Overview

Hybrid
Depends on Experience
Contract - W2
Contract - 12 Month(s)

Skills

FOCUS
Tier 2
Telecommunications
Splunk
Microsoft Power BI
ICS
Grafana

Job Details

Job Title : Triage ManagerSite Reliability Engineer (SRE) Incident Lead

Location : Kansas, MO / Bellevue, WA

Duration : 12+ Months

Job Description:

Experience 10+ years who can take command of bridges and drive

Role

An SRE who leads incidents and manages the bridge until the issue is resolved and tracks action items to closures, including, Change management and problem management

Responsibilities

Actively driving incident calls working with Technical, Product SMEs and Tier 2 SRES

Establishing a timeline of the incident progression and Action item follow-ups until closure during or after the call

Summarizing the discussion into knowledge articles, action items and doing warm hand-offs to Tier 2 teams

Being adopters and advocates of Best-practices collated from experience and SMES like OTel, App Availability and Resiliency

Sending reports of progress from past incidents to leadership

Change management and problem management

Posting updates on AHOD and providing regular updates to leadership

Skills

  • Ability to focus on incidents and work with SMEs and Tier 2 Leads Attention to detail and catching the minutest detail spoken on a call Diligence in follow-ups and driving the SRE mandate to every team and partnering with them to operationalize best practices
  • Great at detailed communication with ICs and precise, succinct communication with leadership
  • Attitude to chase down even outlier issues to resolution

Experience

Someone with a strong technical, project management background

Has worked in telecom industry and SRE Ops

Has experience working in the Digital space and products

Knowledgeable to leverage tools quickly like Splunk, Otel, Grafana, PowerBI, AppD

Outcomes

Operationalizing all the initiatives as part of SRE Transformation

Drives SRE objectives by reducing P1, P2 numbers for identical issues by ensuring teams learn from the past

Contributes to Knowledge base by partnering with SMES

100% closure of tasks (incidents, problem tasks, vendor resiliency etc) within SLAS

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.