Overview
Remote
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 6 month(s)
Skills
Accountability
Advanced Analytics
Real-time
SLA
Dashboard
Service Level
Risk Management
Leadership
Change Management
Problem Management
Workflow
Computer Science
IT Operations
Management
Service Level Management
Availability Management
IT Infrastructure
SAN
Cloud Computing
Analytics
ServiceNow
Microsoft Power BI
Splunk
Communication
Analytical Skill
ITIL
Amazon Web Services
Microsoft Azure
Orchestration
Scripting
Python
Windows PowerShell
DevOps
DevSecOps
Job Details
Title: Service Level Management and Availability Management (SRE) Location: Remote Key Responsibilities:
- Serve as the primary point of accountability for end-to-end service availability and/or Service Level Management.
- Monitor critical infrastructure and application health, leveraging (and in some cases, creating) advanced analytics and real-time dashboards to detect early warning signs and eliminate single points of failure.
- SME level experience in identifying relevant trends and outliers and provide executive-level insights.
- Develop Service Level Management framework, implement SLAs for all OneOps teams in ServiceNow, ensure these SLA/SLOs are included in appropriate PowerBI Dashboard.
- Partner with Architects, DevOps, SRE, and Application teams to drive awareness and alignment with Service Level and/or Availability and move towards a unified IT Operations model across the enterprise.
- Develop and maintain Service Availability Plans, incorporating business priorities, technical dependencies, and risk mitigation strategies.
- Own and evolve metrics for service uptime, reliability, MTTR/MTTI, and user-impacting events. Present trends and recommendations to both technical staff and executive leadership.
- Embed availability practices into Change Management, Release, and Problem Management workflows, ensuring risks are surfaced and planned for up front.
- Bachelor's degree or equivalent practical experience in IT, Computer Science, Engineering, or a related field.
- 5+ years of hands-on experience in IT Operations, SRE, Service Level Management and Availability Management within enterprise-scale environments.
- Proven track record managing Service Level Management/Availability Management and in-depth experience implementing these services in alignment with ITIL.
- Deep understanding of IT infrastructure (compute, storage, network), cloud platforms (AWS/Azure), and modern application architectures.
- Strong PowerBI experience with experience ingesting data from various sources (like ServiceNow).
- Experience with monitoring, alerting, and analytics tools (e.g., ServiceNow, PagerDuty, PowerBI, Datadog, Splunk).
- Exceptional written and verbal communication skills; able to translate technical details for senior leaders and non-technical stakeholders.
- Analytical mindset: able to spot trends, correlate data, and identify improvement opportunities independently.
- Executive presence and the confidence to lead discussions, challenge assumptions, and drive decisions in high-visibility scenarios.
- Must be able to work independently with little oversight and progress quickly.
- ITIL, AWS/Azure, or related certifications (preference will be given to these candidates).
- Experience with automation and orchestration tools.
- Programming/scripting ability (Python, PowerShell, etc.) is a plus.
- Familiarity with DevOps/DevSecOps, SRE, and Monitoring/Observability platforms.
Reach me at
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.