Overview
Job Details
We are seeking an experienced and ITIL-certified Enterprise Operations Senior Analyst with proven expertise in Major Incident Management (MIM) and enterprise operations. This pivotal role prioritizes outage and incident response leadership while maintaining responsibility for continuous monitoring, diagnostics, and operational stability across our global 24x7x365 IT Operations Center (ITOC).
The successful candidate will thrive under pressure, lead cross-functional teams as Incident Commander during major outages, and ensure compliance with ITIL-aligned practices while safeguarding service continuity. In addition to incident leadership, this role will oversee proactive monitoring, troubleshooting, governance, and operational improvements in hybrid on-premises and cloud environments.
Duties and Responsibilities
Major Incident Management
- Serve as Incident Commander during critical outages and high-priority incidents, leading technical bridge calls, directing escalation paths, and ensuring rapid service restoration.
- Manage real-time communications and provide timely updates to executives, stakeholders, and global teams.
- Maintain composure, clarity, and authority in high-pressure situations.
Operational Oversight & Monitoring
- Responsible for real-time monitoring of mission-critical infrastructure and enterprise applications using tools such as Splunk, SolarWinds, Dynatrace, AppDynamics, ThousandEyes, and SCOM.
- Conduct proactive health checks, leverage automation scripts, and resolve anomalies before they escalate into incidents.
- Collaborate with internal teams and vendors to escalate and resolve complex infrastructure and application issues.
Technical Escalation & Troubleshooting
- Perform deep-dive troubleshooting across server OS (Windows/Linux), virtualization platforms (VMware), networking protocols (TCP/IP, DNS, DHCP), and hybrid cloud environments (Azure).
- Utilize scripting (Python, PowerShell, VBScript) to support diagnostics, automation, and performance optimization.
Reporting & Collaboration
- Generate clear and professional shift handover reports, incident retrospectives, and trend analyses.
- Contribute to continuous improvement through metrics tracking, process refinement, and knowledge management.
Qualifications
Education and/or Experience:
Required:
- A minimum of 5 years of experience in IT operations or NOC/ITOC environments.
- A minimum of 5 years of proven leadership in Major Incident Management (bridge call facilitation, executive communications, service restoration).
- Advanced expertise with infrastructure monitoring suites (Splunk, Datadog, ThousandEyes, AppDynamics, Dynatrace, SolarWinds).
- A minimum of 5 years of hands-on technical knowledge across Windows, Linux, VMware, and Cisco networking.
- Strong Azure cloud and hybrid IT infrastructure experience.
- Proficiency with ServiceNow or equivalent ITSM platforms.
- ITIL Foundation Certification.
- Strong scripting capabilities (Python, PowerShell, VBScript).
- Exceptional communication skills (verbal, written, executive-level updates).
Preferred:
- Bachelor’s degree in Information Technology, Computer Science, or related field.
- Experience supporting Microsoft System Center Suite, Office 365, and collaboration platforms such as Zoom.
- Prior experience in legal industry IT environments with strict SLA expectations.
- Demonstrated success in operational improvements (runbook creation, automation, KPI tracking).