Overview
Skills
Job Details
Job Title : Triage ManagerSite Reliability Engineer (SRE) Incident Lead
Location : Kansas, MO / Bellevue, WA
Duration : 12+ Months
Job Description:
Experience 10+ years who can take command of bridges and drive
Role
An SRE who leads incidents and manages the bridge until the issue is resolved and tracks action items to closures, including, Change management and problem management
Responsibilities
Actively driving incident calls working with Technical, Product SMEs and Tier 2 SRES
Establishing a timeline of the incident progression and Action item follow-ups until closure during or after the call
Summarizing the discussion into knowledge articles, action items and doing warm hand-offs to Tier 2 teams
Being adopters and advocates of Best-practices collated from experience and SMES like OTel, App Availability and Resiliency
Sending reports of progress from past incidents to leadership
Change management and problem management
Posting updates on AHOD and providing regular updates to leadership
Skills
- Ability to focus on incidents and work with SMEs and Tier 2 Leads Attention to detail and catching the minutest detail spoken on a call Diligence in follow-ups and driving the SRE mandate to every team and partnering with them to operationalize best practices
- Great at detailed communication with ICs and precise, succinct communication with leadership
- Attitude to chase down even outlier issues to resolution
Experience
Someone with a strong technical, project management background
Has worked in telecom industry and SRE Ops
Has experience working in the Digital space and products
Knowledgeable to leverage tools quickly like Splunk, Otel, Grafana, PowerBI, AppD
Outcomes
Operationalizing all the initiatives as part of SRE Transformation
Drives SRE objectives by reducing P1, P2 numbers for identical issues by ensuring teams learn from the past
Contributes to Knowledge base by partnering with SMES
100% closure of tasks (incidents, problem tasks, vendor resiliency etc) within SLAS