Job Title: Monitoring Consultant
Location Requirement: Candidate must reside within 2 hours of the work location
Work Schedule: 40 hours per week
Work Mode: Hybrid – minimum one day per week onsite
Position Purpose
The Monitoring Consultant will serve as a subject matter expert on enterprise monitoring tools and operational processes. This role works closely with technical teams and vendors to design, implement, and enhance monitoring and reporting solutions.
The consultant will transform manual, person-dependent processes into structured, repeatable, and automated workflows. The position is responsible for managing and continuously improving monitoring operations, including change management, incident reporting, and problem resolution.
This role also evaluates and implements technical solutions for both on-premises and cloud-based environments, develops standard operating procedures (SOPs), and enhances communication strategies to improve operational efficiency and service delivery.
Key Responsibilities
Process Improvement & Automation
Identify process gaps and implement automation-first improvements.
Convert manual tasks into standardized, repeatable automated workflows.
Monitoring & Connectivity
Maintain endpoint monitoring connectivity using agents, SNMP, WMI, and APIs.
Manage certificates and credentials across hybrid networks to ensure secure telemetry ingestion.
Documentation & Knowledge Management
Develop and maintain runbooks, SOPs, service maps, and workflows in version-controlled repositories.
Ensure documentation remains accessible, current, and regularly reviewed.
Incident & Problem Management
Document incidents and problems with full observability context within ITSM platforms.
Conduct post-incident reviews and maintain a Known Error Database.
Track SLA performance, MTTR, and root cause analysis quality metrics.
Change Management & Compliance
Collaborate with Change and Incident Management teams.
Create and track Requests for Change (RFCs) in ITSM tools.
Ensure compliance with IT governance policies and recommend process improvements.
Communication & Reporting
Disaster Recovery & Business Continuity
Design and test disaster recovery plans, defining RTO and RPO.
Support continuity operations during major incidents.
Professional Development
Qualifications
Required
Education & Experience
Bachelor’s degree in Information Technology, Computer Science, or related field.
Minimum 5 years of experience in IT infrastructure monitoring, automation, and observability within hybrid environments.
Technical Skills
Strong proficiency in PowerShell and at least one additional scripting language (Python, Bash, or SQL).
Hands-on experience with:
Experience with monitoring platforms such as:
SCOM
SquaredUp
Dynatrace
Datadog
Splunk
Knowledge of API integrations and secure authentication methods.
Process & Frameworks
Working knowledge of ITIL 4 practices (Change, Incident, and Problem Management).
Experience with ServiceNow or similar ITSM platforms.
Other Skills
Preferred
Certifications
Additional Experience
Dashboarding tools such as SquaredUp or equivalent.
Disaster recovery planning and testing.
Performance tuning and capacity planning for monitoring platforms.
Familiarity With