Overview
Skills
Job Details
Job Title: ITSM Service Delivery / SRE
Location: Remote
Duration: 3-6 months
Mandatory skills are marked in Green
Job Description
Overview:
The Global Hosting Service Delivery team is responsible for managing Infrastructure Operations, including Major Incident Management, Problem (RCA) Management, Enterprise Change Management, and PagerDuty. Additionally, we are building new service offerings around Service Level Management and Availability Management. The ideal candidate will have strong Infrastructure, Cloud, and Operations experience in enterprise environments and possess deep subject matter expertise in Service Level Management and Availability Management. This person will require strong technical capabilities and confident communication skills. They must be able to multitask in a fast-paced environment with short timelines and high visibility from our clients and internal customers. This person will interface with Infrastructure Architects, Application Development within the Business Units, and Senior Leadership. The ideal candidate will be comfortable communicating at all levels and have a broad technical understanding as well as specific, in-depth knowledge of implementing Service Level and Availability Management. This person should be able to gather requirements, ask appropriate questions, and have above-average communication skills, as well as project management and presentation skills. Strong PowerBI skills are required. This person must be able to conceptualize and translate their vision as well as quickly progress to implementation. This team member will be responsible for maximizing service availability reporting across complex environments. This role blends technical depth with operational rigor, driving proactive measures to prevent outages, managing high-stakes incident response, and collaborating across business and IT to ensure resilient, always-on service delivery.
Key Responsibilities:
Serve as the primary point of accountability for end-to-end service availability and service level management, spanning on-premises, cloud, and third-party integrations.
Monitor critical infrastructure and application health, leveraging (and in some cases, creating) advanced analytics and real-time dashboards to detect early warning signs and eliminate single points of failure.
Partner with Architects, DevOps, SRE, and Application teams to drive awareness and alignment with Service Level and Availability and move towards a unified IT Operations model across the enterprise.
Develop and maintain Service Availability Plans, incorporating business priorities, technical dependencies, and risk mitigation strategies.
Own and evolve metrics for service uptime, reliability, MTTR/MTTI, and user-impacting events. Present trends and recommendations to both technical staff and executive leadership.
Embed availability practices into Change Management, Release, and Problem Management workflows, ensuring risks are surfaced and planned for up front.
Mentor team members in proactive monitoring, resilience engineering, and incident response best practices. Foster a culture of continuous improvement and transparency.
Required Skills & Experience:
Bachelor s degree or equivalent practical experience in IT, Computer Science, Engineering, or a related field.
10+ years of hands-on experience in IT Operations, SRE, or Availability Management within enterprise-scale environments.
Proven track record managing Service Level Management/Availability Management and in-depth experience implementing these services in alignment with ITIL.
Deep understanding of IT infrastructure (compute, storage, network), cloud platforms (AWS/Azure), and modern application architectures (microservices, containerization).
Proven experience with ITIL/ITSM best practices around Availability and Service Level Management, also in relation to Incident Management.
Experience with monitoring, alerting, and analytics tools (e.g., ServiceNow, PagerDuty, PowerBI, Datadog, Splunk).
Exceptional written and verbal communication skills; able to translate technical details for senior leaders and non-technical stakeholders.
Analytical mindset: able to spot trends, correlate data, and identify improvement opportunities independently.
Executive presence and the confidence to lead discussions, challenge assumptions, and drive decisions in high-visibility scenarios.
Programming/scripting ability (Python, PowerShell, etc.) is a plus.
Must be able to work independently with little oversight and progress quickly.
Mandatory:
ITIL, AWS/Azure, or related certifications. Candidates holding these certifications will be preferred and prioritized.
Experience with automation and orchestration tools.
Familiarity with DevOps/DevSecOps, SRE, and Monitoring/Observability platforms