Job Role: Incident Manager
Location: Mount Laurel, NJ/West Chester, PA
Job Description:
Must Have Technical/Functional Skills
Incident Management, SRE and operations engineering, reliability architecture, Automation and observability, executive communication
Roles & Responsibilities
Incident Manager - Resources to provide technical leadership for enterprise wide, high severity incidents, problem investigations, and high-risk changes, while shaping reliability strategy, governance, and operational standards across complex, distributed platforms.
- Drive Incident resolution management by directing cross functional teams through high impact outages, systemic problem resolution, and large-scale change events.
- Creating scripts in ELK, Grafana, AppDynamics, COP
- Auto-executing predefined queries in ELK, Grafana, AppDynamics, COP for real-time issues
- Attaching live query outputs (metrics, logs, traces) directly to alerts/incidents
- Eliminating manual tool navigation for IM and Alert teams
- Enhancing alert systems with contextual intelligence, including metric deviations and anomaly trends, relevant log snippets and patterns, and identifying affected CIs and downstream impacts
Education
Minimum Graduation