Responsibilities
• Proactively monitor network health, ensuring identification and analysis of potential issues before they impact operations.
• Monitor network and system health using enterprise monitoring tools.
• Escalate issues appropriately based on severity and impact.
• Maintain and update Knowledge Base (KB) articles with handling instructions for recurring alerts and incidents.
• Ensure all incident details are accurately documented in ServiceNow, the system of record.
• Apply NOC Event Management processes to triage, classify, and route events appropriately.
• Act as Incident Manager during technical bridge calls for major outages.
• Coordinate resources and send timely communications during major incidents.
• Perform start-of-day and end-of-day procedures required for critical applications and services.
• Oversee incident response, ensuring timely resolution and adherence to SLA commitments.
• Lead associated incident response efforts to include coordinating participation of cross functional teams to help ensure quickest resolution timeline
• Actively able to lead major incidents confidently by ensuring a timely response and resolution to all Major Incidents impacting the firm that will include post incident activities and other demands identified per scenario
• Support alert tuning efforts by identifying noisy or low-value alerts and recommending suppression or reconfiguration.
• Assist in the onboarding of new application monitoring by validating alert rules, thresholds, and runbook documentation.
• Generate and distribute shift handoff reports summarizing active incidents, ongoing monitoring items, and pending follow-ups.
• Participate in on-call rotation to provide 24x7 monitoring coverage and incident response support.
Qualifications
• Required:
• High school diploma or equivalent required; Associate''s or Bachelor''s degree in Information Technology, Computer Science, or related field preferred.
• 0–2 years of experience in a NOC, help desk, IT operations, or technical support environment (entry-level candidates welcome).
• Basic understanding of networking concepts (TCP/IP, DNS, DHCP, VPN, routing/switching).
• Familiarity with ITIL fundamentals and incident management processes.
• Experience or exposure to monitoring tools such as Datadog, SolarWinds, Nagios, or similar platforms.
• Experience working with ticketing systems such as ServiceNow.
• Strong communication skills — able to document and convey incident details to both technical and non-technical stakeholders.
• Ability to work in a fast-paced environment, manage multiple priorities, and remain composed under pressure.
• Willingness to work rotating shifts, weekends, and holidays as required for 24x7 operations coverage.
•
• Preferred:
• 1+ years of NOC, IT operations, or infrastructure monitoring experience in a financial services or enterprise environment.
• Experience with Datadog monitoring, dashboards, and alert configuration.
• Hands-on experience with ServiceNow incident, event, and knowledge management modules.
• ITIL v3 or v4 certification (or actively pursuing).
• Experience with runbook creation, KB article management, or standard operating procedure (SOP) documentation.
• Understanding of application monitoring, log analysis, and basic scripting (PowerShell, Python, or Bash).
• CompTIA Network+, CompTIA Security+, or equivalent certification.