Apply Now

Major Incident Management (MIM) & NOC Lead

Wilmington, DE, US • Posted 4 hours ago • Updated 1 hour ago

Full Time

On-site

Fitment

Dice Job Match Score™

⏳ Almost there, hang tight...

Job Details

Skills

NOC
Major Incident
Dynatrace
Splunk
Grafana

Summary

Experience: 10+ years in IT Operations / NOC / Major Incident Management, including leadership ownership.

Role Summary:

The Major Incident Management & NOC Lead is responsible for end-to-end command and control of the enterprise s 24x7 operational monitoring and incident response. This role leads the MIM and NOC function, drives Major Incident (P1/P2) execution, ensures rapid service restoration, and continuously improves operational maturity through problem management, automation, observability enhancements, and SLA governance.

This role requires a mix of strong incident leadership, technical depth across infrastructure and applications, and people/process management to ensure stability, availability, and performance across critical services.

Key Responsibilities:

A) Major Incident Management (Command & Control)

Own the Major Incident (P1/P2) process from detection to resolution, including war-room leadership, stakeholder updates, and closure.

Act as the Incident Commander and ensure structured triage, containment, workaround, and restoration.

Drive cross-functional coordination (App, Infra, Network, Security, DB, Cloud, Vendor teams) to reduce MTTR.

Ensure high-quality incident communications: executive summaries, impact analysis, ETAs, customer/business comms.

Lead and facilitate Post Incident Reviews (PIR/RCA); ensure actionable corrective/preventive actions (CAPA).

Identify recurring issues and trigger Problem Management with measurable reduction plans.

B) NOC Leadership & Operations

Lead the NOC team responsible for 24x7 monitoring, alert triage, event correlation, escalation, and ticket quality.

Establish/maintain standard operating procedures (SOPs), runbooks, escalation matrices, and on-call models.

Ensure NOC meets SLAs/OLAs, improves alert fidelity, and reduces noise through tuning and automation.

Manage handover governance between shifts; maintain service continuity and operational hygiene.

C) Service Reliability & Continuous Improvement

Drive operational improvements: monitoring coverage, SLO/SLA alignment, incident prevention, and resiliency initiatives.

Partner with Engineering/Platform teams on observability strategy, proactive detection, and reliability patterns.

Track and report operational metrics: MTTD, MTTR, incident volume, re-open rate, SLA compliance, and trends.

Support readiness for audits and compliance: evidence collection, process adherence, and risk mitigation.

D) Stakeholder & Vendor Management

Interface with business stakeholders, service owners, and leadership to provide incident status, risk, and remediation plans.

Manage vendor escalations and ensure timely resolution aligned to contractual SLAs.

E) Managerial / Leadership Skills (Must Have)

Proven experience leading MIM & NOC Operations teams (shift-based or on-call models).

Strong Incident Commander capability: calm under pressure, structured decision-making, priority trade-offs.

Excellent stakeholder management across technical teams and business leadership.

Ability to build and enforce process discipline (ITIL-aligned), while improving speed and quality.

Strong coaching/mentoring: performance management, skill development, hiring support as needed.

Effective communication: concise executive updates, clear action plans, facilitation of PIR/RCA sessions.

Data-driven mindset: uses metrics and trend analysis to drive operational outcomes.

Technical Skills (Must Have):

A) Monitoring / Observability

Hands-on experience with NOC tooling and observability platforms such as:

Splunk / ELK, Datadog, Dynatrace, New Relic, AppDynamics

PrometheGrafana, CloudWatch/Azure Monitor

Strong understanding of event correlation, alert tuning, noise reduction, and dashboarding.

B) Incident / ITSM Platforms

Strong working knowledge of ServiceNow (Incident, Problem, Change, Knowledge, CMDB) or equivalent ITSM tools.

Experience designing workflows, SLAs/OLAs, routing rules, and automation integrations.

C) Infrastructure & Platform Breadth

Solid understanding across:

Windows/Linux administration basics

Network fundamentals (DNS, DHCP, TCP/IP, routing, load balancers, firewalls)

Compute/virtualization (VMware/Hyper-V) and storage concepts

Databases fundamentals (SQL/Oracle, replication, performance symptoms)

Cloud fundamentals and operational support for AWS/Azure/Google Cloud Platform:

IAM basics, networking (VPC/VNet), scaling, logging/monitoring, common failure patterns.

D) Automation & Scripting (Good to Have / Preferred)

Scripting knowledge: PowerShell / Python / Bash

Familiarity with automation tools: Ansible, Terraform, CI/CD operational workflows.

Ability to create/maintain runbook automation and self-healing patterns.

E) Security & Resilience (Preferred)

Awareness of security operations touchpoints: DDoS symptoms, certificate expiries, IAM issues, endpoint/EDR alerts.

Familiarity with BCP/DR processes, failover testing, and resilience design collaboration.

F) ITIL / Process Expectations

Strong ITIL understanding across Incident, Problem, Change, Knowledge, and Service Level Management.

Ability to implement governance around:

Change risk assessment, change windows, incident-change correlation

RCA quality, action item tracking, and effectiveness validation

Qualifications:

Bachelor s degree in computer science / IT / Engineering or equivalent experience.

ITIL v4 Foundation (preferred).

Cloud certifications (preferred): AWS/Azure fundamentals or associate level.

Experience in enterprise production environments with stringent availability requirements.

Success Metrics / KPIs

Reduced MTTD and MTTR for P1/P2 incidents.

Improved SLA compliance and reduction in escalation breaches.

Reduced repeat incidents via problem management and preventive actions.

Improved alert quality: lower false positives, better signal-to-noise ratio.

Strong PIR/RCA compliance: on-time RCAs with measurable preventive outcomes.

Improved NOC operational maturity: SOP adherence, shift handover quality, audit readiness.

Nice-to-Have Industry Contexts

Transportation / financial services / healthcare / e-commerce / SaaS environments with high availability targets.

Experience supporting microservices, Kubernetes, and distributed systems.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10459445
Position Id: MU314922
Posted 4 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Wilmington, Delaware

•

Today

Job Description As a Senior Lead Infrastructure Engineer in Consumer and Community Banking at JPMorganChase, you utilize strong knowledge of software, applications, and technical processes within the infrastructure engineering discipline. In this role you will elevate our Capacity Management practice and ensure mission critical applications have the right technology capacity at the right time. Job Responsibilities Own the end-to-end Capacity Management strategy and operating model for critical

Full-time

Software Engineer III - Python - Public Cloud Identity

Wilmington, Delaware

•

Today

Job Description As a Software Engineer - Multiple Cloud Platforms at JPMorgan Chase's Cloud Foundational Services team, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for delivering critical technology solutions across multiple technical areas within various business functions in support of the firm's business objectives. Job Responsibilities Design, implement, and mana

Full-time

USD 133,000.00 - 185,000.00 per year

Software Engineer [Multiple Positions Available]

Wilmington, Delaware

•

Today

Job Description DESCRIPTION: Duties: Participate in the design and development of scalable and resilient systems on Cloud Platform by following JPMC process, practices, and CI/CD deployment methods. Execute software development and technical troubleshooting. Create secure and high-quality production code and maintain algorithms that run synchronously with appropriate systems. Apply knowledge of tools within the Software Development Life Cycle toolchain to improve the value realized by automati

Full-time

Cyber Threat Research Intelligence Analyst

Remote or Wilmington, Delaware

•

Today

This role is four days onsite at our Wilmington Center, Wilmington, DE location, with the flexibility to work from home one day per week Overview: Responsible for gathering, analyzing, and interpreting intelligence data to identify potential threats to M&T Bank's security. Uses this information to proactively to inform how Cybersecurity should strengthen defenses, mitigate threats, and enhance security posture. Manages the end-to-end indicator of compromise (IOC) lifecycle (intake, normalization

Full-time

USD 106,700.00 - 177,900.00 per year

Search all similar jobs

Major Incident Management (MIM) & NOC Lead

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs