Drive process and tooling improvements - Identify gaps and implement automation-first practices to reduce manual effort and improve service quality.
Maintain endpoint monitoring connectivity - Ensure reliable telemetry ingestion via agents, SNMP, WMI, and APIs; manage certificates and credentials across hybrid networks.
Own documentation and knowledge management - Create and maintain runbooks, SOPs,service maps, and workflows in an organized, version-controlled repository;ensure accessibility and periodic review.
Document incidents and problems with observability context - Capture monitoring data in ServiceNow tickets; produce post-incident reviews and maintain a Known Error Database.
Collaborate on change, incident, and problem management - Work with Enterprise Change and Incident Management teams to ensure standardized processes, risk assessments,and communication plans are followed.
Monitor resolution performance and service restoration - Track SLAs, MTTR, and root cause analysis quality; ensure corrective actions are implemented and validated.
Standardized communication and stakeholder updates - Implement structured communication workflows for changes, incidents, and problems; manage distribution lists and enable self-service subscription options.
Ensure compliance with Commonwealth IT policies - Align services with public and enterprise policy objectives; recommend updates to improve reliability,security, and cost efficiency.
Utilize ServiceNow for change management - Create and track Requests for Change, link risk assessments, and validate post-change monitoring health.
Provide SLA reporting and operational metrics - Submit accurate data on availability,incidents, and enhancements for monthly/quarterly SLA reports.
Design and test disaster recovery plans - Define RTO/RPO for network and monitoring infrastructure; execute periodic DR exercises and update plans.
Maintain technical currency - Stay current with emerging monitoring technologies and best practices; pursue relevant training and certifications.
Fulfill Continuity of Government (CoG) obligations - Perform assigned duties during CoG activation, including relocation to alternate sites during catastrophic incidents.
Adhere to IT service management processes - Operate within ITIL-aligned frameworks;contribute to process maturity and compliance audits.
Qualifications
Required
Education/Experience
-
- 5+ years of experience in IT infrastructure monitoring, automation, and observability in hybrid environments.
- Bachelor's Degree in IT/CompSci or related field
- Technical Skills
- Strong proficiency in PowerShell and at least one other scripting language (e.g., Python, Bash, SQL).
- Hands-on experience with Azure Monitor, Log Analytics, Ansible, SQL, and KQL.
- Experience implementing automation using Azure Automation and CI/CD pipelines.
- Expertise in monitoring platforms such as SCOM, SquaredUp, or equivalent (e.g., Dynatrace, Datadog, Splunk).
- Knowledge of API integration and secure authentication.
- Process & Frameworks
- Working knowledge of ITIL 4 practices (Change, Incident, Problem Management).
- Experience with ServiceNow or similar ITSM platforms.
- Other
- Strong troubleshooting and root cause analysis skills.
- Excellent documentation and communication abilities.
Preferred
- Certifications:
- Microsoft Certified: Azure Administrator Associate or Azure Solutions Architect Expert.
- ITIL 4 Foundation or higher.
- Experience with:
- SquaredUp or equivalent dashboarding tools.
- Disaster Recovery planning and testing.
- Performance tuning and capacity planning for monitoring platforms.
- Familiarity with:
- Security best practices for API and automation scripts.
- Hybrid cloud environments and networking fundamentals.
| | Required / Desired | | |
| Experience in IT infrastructure monitoring, automation, and observability in hybrid environments | Required | | |
| Strong proficiency in PowerShell and at least one other scripting language (Python, SQL, Bash) | Required | | |
| Hands-on experience with Azure Monitor, Log Analytics, Ansible, SQL and KQL | Required | | |
| Experience implementing automation using Azure Automation and CI/CD pipelines | Required | | |
| Expertise in monitoring platforms such as SCOM, SquaredUp, or equivalent (e.g. Dynatrace, Datadog, Splunk) | Required | | |
| Knowledge of API integration and secure authentication | Required | | |
| Experience with ServiceNow or similar ITSM platforms | Required | | |
| Microsoft Certified - Azure Administrator Associate or Azure Solutions Architect Expert | Highly desired | | |
| ITIL 4 Foundation or higher | Highly desired | | |