ITSM Service Delivery / SRE

Overview

Remote

Depends on Experience

Contract - Independent

Contract - W2

Contract - 6 Month(s)

No Travel Required

Able to Provide Sponsorship

Skills

ITSM Service Delivery

SRE

ITSM

PagerDuty

Infrastructure

Cloud

Service Level Management

Power BI

on-premises

MTTR/MTTI

Change Management

Release Management

Problem Management

Availability Management

ITIL

AWS

Azure

microservices

containerization

ServiceNow

Datadog

Splunk

Python

PowerShell

automation

orchestration

DevSecOps

Job Details

Job Title: ITSM Service Delivery / SRE

Location: Remote

Duration: 3-6 months

Mandatory skills are marked in Green

Job Description

Overview:

The Global Hosting Service Delivery team is responsible for managing Infrastructure Operations, including Major Incident Management, Problem (RCA) Management, Enterprise Change Management, and PagerDuty. Additionally, we are building new service offerings around Service Level Management and Availability Management. The ideal candidate will have strong Infrastructure, Cloud, and Operations experience in enterprise environments and possess deep subject matter expertise in Service Level Management and Availability Management. This person will require strong technical capabilities and confident communication skills. They must be able to multitask in a fast-paced environment with short timelines and high visibility from our clients and internal customers. This person will interface with Infrastructure Architects, Application Development within the Business Units, and Senior Leadership. The ideal candidate will be comfortable communicating at all levels and have a broad technical understanding as well as specific, in-depth knowledge of implementing Service Level and Availability Management. This person should be able to gather requirements, ask appropriate questions, and have above-average communication skills, as well as project management and presentation skills. Strong PowerBI skills are required. This person must be able to conceptualize and translate their vision as well as quickly progress to implementation. This team member will be responsible for maximizing service availability reporting across complex environments. This role blends technical depth with operational rigor, driving proactive measures to prevent outages, managing high-stakes incident response, and collaborating across business and IT to ensure resilient, always-on service delivery.

Key Responsibilities:

Serve as the primary point of accountability for end-to-end service availability and service level management, spanning on-premises, cloud, and third-party integrations.

Monitor critical infrastructure and application health, leveraging (and in some cases, creating) advanced analytics and real-time dashboards to detect early warning signs and eliminate single points of failure.

Partner with Architects, DevOps, SRE, and Application teams to drive awareness and alignment with Service Level and Availability and move towards a unified IT Operations model across the enterprise.

Develop and maintain Service Availability Plans, incorporating business priorities, technical dependencies, and risk mitigation strategies.

Own and evolve metrics for service uptime, reliability, MTTR/MTTI, and user-impacting events. Present trends and recommendations to both technical staff and executive leadership.

Embed availability practices into Change Management, Release, and Problem Management workflows, ensuring risks are surfaced and planned for up front.

Mentor team members in proactive monitoring, resilience engineering, and incident response best practices. Foster a culture of continuous improvement and transparency.

Required Skills & Experience:

Bachelor s degree or equivalent practical experience in IT, Computer Science, Engineering, or a related field.

10+ years of hands-on experience in IT Operations, SRE, or Availability Management within enterprise-scale environments.

Proven track record managing Service Level Management/Availability Management and in-depth experience implementing these services in alignment with ITIL.

Deep understanding of IT infrastructure (compute, storage, network), cloud platforms (AWS/Azure), and modern application architectures (microservices, containerization).

Proven experience with ITIL/ITSM best practices around Availability and Service Level Management, also in relation to Incident Management.

Experience with monitoring, alerting, and analytics tools (e.g., ServiceNow, PagerDuty, PowerBI, Datadog, Splunk).

Exceptional written and verbal communication skills; able to translate technical details for senior leaders and non-technical stakeholders.

Analytical mindset: able to spot trends, correlate data, and identify improvement opportunities independently.

Executive presence and the confidence to lead discussions, challenge assumptions, and drive decisions in high-visibility scenarios.

Programming/scripting ability (Python, PowerShell, etc.) is a plus.

Must be able to work independently with little oversight and progress quickly.

Mandatory:

ITIL, AWS/Azure, or related certifications. Candidates holding these certifications will be preferred and prioritized.

Experience with automation and orchestration tools.

Familiarity with DevOps/DevSecOps, SRE, and Monitoring/Observability platforms

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About TechVirtue LLC

Share