Overview
Skills
Job Details
Job Title: Incident Commander
Location: Remote
Duration: / Term: 6+ months
Job Description:
Experience Desired: 12+ Years.
Key required skills
The Incident Commander is responsible for leading the end-to-end management of major production incidents across Telecom digital platforms, including Digital Commerce, Order Management, Payments, Mobile/Web Applications, and Customer Data systems. This role operates within a 24x7 centralized operations model and serves as the single point of command during high-severity incidents, ensuring rapid stabilization, clear decision-making, and effective stakeholder coordination.
The Incident Commander combines strong technical depth with exceptional communication skills to lead large, cross-functional teams under pressure, minimize customer and business impact, and restore services within defined SLAs.
Key Responsibilities
Major Incident Management & Command
- Act as the Incident Commander for Sev-1 and Sev-2 incidents across Telecom digital platforms.
- Own the incident lifecycle from detection through stabilization, resolution, and post-incident review.
- Lead incident bridge calls with a large number of technical, business, and executive stakeholders.
- Establish command-and-control during incidents, driving focus, accountability, and rapid decision-making.
- Ensure accurate impact assessment and prioritization based on customer, revenue, and regulatory impact.
Telecom Digital Platform Expertise
- Lead incident response across:
- Digital Commerce platforms (customer acquisition, checkout, promotions)
- Order Management and fulfillment systems
- Payments, billing integrations, and financial transaction flows
- Mobile and web applications
- Customer Information and data management platforms
- Quickly understand complex, distributed system interactions and failure modes.
- Provide technical direction and guidance during root cause identification and remediation.
Centralized Operations & 24x7 Support
- Operate within a centralized 24x7 operations model supporting mission-critical digital platforms.
- Coordinate across global onshore and offshore support teams, SREs, engineering, infrastructure, and vendors.
- Ensure adherence to incident response SLAs, escalation paths, and operational runbooks.
- Drive continuous improvement of incident response processes and tooling.
Stakeholder & Executive Communication
- Serve as the single, authoritative voice during incidents for internal and external stakeholders.
- Communicate incident status, impact, mitigation steps, and ETAs clearly and concisely.
- Manage executive-level updates and ensure consistent messaging across all forums.
- Handle high-pressure situations with confidence, clarity, and professionalism.
Technical Leadership & Problem Solving
- Lead technical troubleshooting efforts without necessarily being hands-on in code.
- Challenge assumptions, validate hypotheses, and drive teams toward data-driven resolution paths.
- Ensure effective use of monitoring, logging, and observability tools.
- Balance speed of recovery with risk, customer impact, and system integrity.
Post-Incident Review & Prevention
- Facilitate post-incident reviews (PIRs / RCAs) with engineering and operations teams.
- Ensure root causes are clearly identified and corrective actions are defined and tracked.
- Identify systemic issues and recommend long-term preventive measures.
- Drive improvements in platform resilience, monitoring, automation, and operational readiness
Key Skills:
Incident Management, Productions, SLAs