The successful job candidate will architect and implement availability, performance, and end to end enterprise operations monitoring, automated event correlation and analysis, notification, escalation, and incident drill down diagnosis in a complex IT operations environment. This person will assist in developing an enterprise monitoring framework, selecting monitoring tools, defining new monitoring points, creating appropriate tests, alerts, and building operational dashboards. This will include monitoring product administration responsibilities to include architecting for growth and scaling for the environment, sizing new and existing environments, user and security administration for monitoring products.
- Possess experience in evaluating, recommending, and implementing monitoring tools, standards, and processes including monitoring tool architecture, deployment, integration, and maintenance with multi-vendor tool suites.
- Requires practical knowledge of transaction processing, application development, systems administration, and web infrastructure (web services, trend analysis, etc.).
- Practical experience in APM working with application & infrastructure teams to define / interpret monitoring /alerting requirements for managing end user experience, fault isolation, and proactive environment health management including alerting, analysis, and reporting using Dynatrace One Agent. A deep knowledge of Dynatrace One Agent will be a bonus.
- Experience with Splunk operational monitoring with a focus on log file event standardization, filtering, detection, correlation, alerting, reporting, and dashboarding along with user and security administration.
- Practical experience and proven knowledge of the Tivoli monitoring tool to include new installations, upgrades, license management, agent deployments, both application and product problem analysis/resolution, reporting, trending, and dashboarding.
- Requires knowledge of incident management tools, preferably ServiceNow, for ticketing, change control, and use of the CMDB. Experience integrating other products into Service Now is a plus.
- Must have proven understanding of underlying technologies -- i.e., operating systems (UNIX, Linux, and Windows), networks, and databases. Prefer but do not require systems engineering experience.
- General knowledge of coding practices with the ability to write code and scripts used in integrations of products and creating one-off alerts as well as creating and understanding synthetic application monitors.
- This person will create training plans and materials, organize the classes, and implement user training and lead user groups.
- Actively mentor technical skill development within the Enterprise Systems Monitoring (ESM) team.
- Effective communication skills with all internal personnel and appropriately engaging vendor resources. Expected to participate in on-call support.
- Adaptability and willingness to learn new tools and technologies
- Advanced problem solving and troubleshooting skills and the ability to facilitate the resolution of complex issues with innovative solutions
- Excellent presentation, verbal, and written communication skills
- Product knowledge of Dynatrace, Splunk, NetCool, and Tivoli as well as using ServiceNow, the CMDB, ticketing, PagerDuty (or similar), AI will be preferred
License/Certification/Education: Normally requires a B.S. Degree in Computer Science with 10 years of experience in related field.