Overview
Skills
Job Details
Job Title: Incident Manager
Location: Primary location is Reston, VA- 5 Days Onsite
Duration: 12+ months with possible extensions
The role includes rotational shifts, typically starting with a standard 9 5 schedule. Over time, based on performance, the candidate may be assigned to early (7 3) or late (11 7) shifts, including occasional night or weekend shifts usually once in a month.
Job Description:
Manage incidents 24/7 using Fannie Mae processes, lead technical triage, share insights from monitoring tools, and detail resolutions. Recommend process improvements, provide timely updates, assist in postmortems, and support operational enhancements. Maintain application uptime through troubleshooting, bug fixes, performance documentation, and collaboration with infrastructure teams.
Key Job Functions
- Provide expert-level incident management in a 24/7/365 environment.
- Lead triage and resolution of high-impact, complex incidents.
- Act as command center to minimize business disruption.
- Use monitoring tools for quick root cause analysis and resolution.
- Collaborate across teams for recovery and process enhancement.
- Ensure timely stakeholder communication and proper escalation.
- Deliver incident trend analysis and detailed reporting.
- Participate in on-call rotation and shift work.
- Present insights and metrics to senior leadership.
Required Knowledge & Skills
- Skilled in leading large-scale incident calls with up to 150 participants.
- Proficient in Microsoft Word, Excel, and PowerPoint; capable of presenting data-driven insights to senior leadership.
- Holds AWS Cloud Certification (beyond Cloud Practitioner) and ITIL Certification.
- Hands-on experience with Splunk and other transaction-level monitoring tools.
- Experienced in using ServiceNow for incident and change management.
- Capable of transaction-level troubleshooting in AWS cloud environments.
- Familiar with monitoring tools like ExtraHop, SolarWinds, and Catchpoint.
- Skilled at identifying trends in application health via dashboards and reports.
- Experienced in working with compliance, audit, and support teams.
- Able to present data analysis findings and lead remediation efforts.
- Proficient in AWS Console/CLI, scripting, SQL, and tools like PowerBI and Tableau.
- Knowledgeable in IT infrastructure including servers, networks, databases, firewalls, and monitoring solutions.
- Uses a structured, analytical approach to issue resolution.
- Strong communicator, especially under pressure, with ability to guide technical teams.
Preferred Qualifications:
- AWS Certified Solutions Architect Associate Certification
- Experience with OpenTel, SignalFX