Title; Monitoring Lead- Application Hosting Location: Washington, DC (100% Onsite) duration: 6 months, possible extn. security clearance: Public Trust
Assist in driving, standardizing, and managing unified configuration management database.
Collect and aggregate data to support decisions across ITIL processes (configuration, event, capacity, availability, demand, incident, event, and problem management) and perform analysis.
Assess and fine tune monitoring capabilities to provide accurate and actionable alerts to the 24x7 operations systems.
Create and provide intuitive and informative dashboards on current and past performance and service status.
Configure, maintain, and optimize monitoring dashboards to monitor health and performance across diverse IT infrastructure components.
Deploy, manage, and update Management Packs, connectors, and monitoring policies to support business application and service monitoring needs.
Perform event correlation and filtering to streamline incident triage, reduce noise, and ensure timely escalation to appropriate operational teams.
Integrate data sources from third-party monitoring tools (OpenText OBM, SiteScope, Microsoft SCOM) into the unified OBM event console.
Conduct proactive performance and availability monitoring, identify root causes of issues, and implement preventive measures to improve service delivery.
Required Education and Experience:
Must have extensive knowledge of multi-vendor server operating systems.
Minimum of 7 years of experience related experience
Minimum 2 years of experience managing OpenText suite of tools including AI Operations Management, Operations Bridge, SiteScope, and Optic
Direct experience and expertise with Management Protocols including SNMP, and WMI
Scripting Experience: PowerShell, VBScript, and/or other scripting experience
Experience managing monitoring systems with >250 Host and/or >3000 sensors
Experience operating other monitoring solutions including Zenoss, PRTG, Zabbix, and/or Nagios
Extensive experience with monitoring server, storage, database management, networking, and applications, with a strong emphasis on maximizing the value and effectiveness of monitoring solutions
Proven track record of engineering monitoring solutions, providing strategic direction, and fostering a collaborative and innovative work environment.
Preferred skills and qualifications:
Experience supporting a 24x7 operations environment
Experience leading troubleshooting coordination/ acting as a Tech Lead during service outages requiring collaboration across multiple teams and infrastructure components
Systems administrator experience managing Windows and/or Linux operating systems
Expert level experience with scripting and automation
Experience integrating monitoring tools to operate through ServiceNow
Experience automating alerts to generate Service Tickets
Strong understanding of ITIL and ITSM including monitoring, demand management, availability management, and capacity management
ITIL certification(s) including Foundations and above strongly preferred
Experience analyzing monitoring and associated reports to drive business decisions for capacity and availability experience
Experience creating senior level brief work products including functional and data driven dashboards from captured performance data and availability metrics.
Experience with visualization and computational tools