Senior SRE/DevOps

• Posted 2 hours ago • Updated 2 hours ago
Full Time
Compensation information provided in the description
Fitment

Dice Job Match Score™

⏳ Almost there, hang tight...

Job Details

Skills

  • Bridging
  • High Availability
  • Recovery
  • Stakeholder Engagement
  • Operational Excellence
  • Root Cause Analysis
  • Mentorship
  • Team Building
  • Production Engineering
  • DevOps
  • Reliability Engineering
  • Production Support
  • Incident Management
  • ServiceNow
  • Dynatrace
  • Grafana
  • Splunk
  • Cloud Computing
  • Amazon Web Services
  • Microsoft Azure
  • Google Cloud
  • Google Cloud Platform
  • Kubernetes
  • Linux
  • Unix
  • Microsoft Windows
  • Computer Networking
  • Scripting
  • Python
  • Bash
  • Java
  • .NET
  • Accountability
  • Management
  • Problem Solving
  • Conflict Resolution
  • Decision-making
  • Collaboration
  • Communication
  • Leadership
  • Continuous Improvement
  • Computer Science
  • Professional Development
  • Technical Training
  • Spectrum
  • IT Management
  • Training
  • Taxes
  • Insurance

Summary

What You'll Do:

Production Ownership & Incident Leadership

  • Act as a Primary On-Call Engineer in a 24 7 production environment
  • Lead response for high-severity incidents (P1/P2), including:
    • Driving incident bridges
    • Coordinating cross-team response
    • Ensuring timely resolution and communication
  • Serve as a point of accountability for production stability during assigned shifts

Infrastructure & System Reliability

  • Manage and support production infrastructure across cloud and on-prem environments
  • Monitor system health and proactively identify risks, bottlenecks, and failure points
  • Ensure high availability, performance, and resilience of applications and services

DevOps & Automation

  • Design and implement automation to improve reliability and reduce manual intervention
  • Improve monitoring, alerting, and observability frameworks
  • Drive initiatives to reduce incident frequency and improve recovery times

Stakeholder Engagement & Communication

  • Partner directly with engineering leaders, infrastructure teams, and product stakeholders
  • Provide clear, concise communication during incidents and escalations
  • Translate technical issues into business impact for leadership visibility
  • Build trust through reliability, responsiveness, and ownership

Operational Excellence & Continuous Improvement

  • Lead or contribute to root cause analysis (RCA) and post-incident reviews
  • Identify systemic issues and drive long-term fixes
  • Establish and improve operational processes, runbooks, and standards
  • Mentor junior engineers and support team development

What You Know:

Technical Expertise

  • 7+ years of experience in SRE, DevOps, or Production Engineering roles
  • Strong experience in DevOps, Site Reliability Engineering (SRE), or production support environments
  • Experience with monitoring and incident management tools such as:
    • ServiceNow
    • Dynatrace
    • Grafana
    • Datadog, Observe, Splunk, or similar platforms
    • PagerDuty
  • Solid hands-on experience with:
    • Cloud platforms (AWS, Azure, or Google Cloud Platform)
    • Kubernetes
    • Linux/Unix and Windows systems
    • Networking fundamentals
  • Scripting experience using Python, Bash, or similar languages
  • Programming experience in Java and/or .NET

Leadership & Behavioural Skills

  • Strong ownership mindset and accountability
  • Ability to lead under pressure and manage high-severity incidents
  • Excellent communication skills, especially with non-technical stakeholders
  • Comfortable operating in a high-visibility, leadership-facing environment
  • Strong problem-solving and decision-making abilities

Work Model

  • Remote work
  • Participation in a rotating on-call schedule, including weekends, and holidays as needed
  • Ability to respond to critical incidents outside of standard working hours when required

What Success Looks Like

  • High-severity incidents are handled efficiently with strong coordination and communication
  • Production systems remain stable, performant, and resilient
  • Reduction in recurring incidents through proactive improvements
  • Strong trust established with client leadership and stakeholders
  • Continuous improvement of operational maturity and reliability practices


Education:

  • Bachelor's degree in Computer Science or related field.

Benefits:

  • In addition to competitive salaries and benefits packages, Nisum US offers its employees some unique and fun extras:
  • Professional Development - We offer in-house technical training and professional learning programs aimed at developing skills across a broad spectrum of topics such as technology, leadership, role-based training, and process expertise. We also offer an annual stipend for employees to attend external courses in order to maintain professional certifications
  • Health & Wellness Benefits - We believe that your health and welfare are important, and we strive to ensure that you have affordable options available to you, including some plans that are subsidized for employees and their families up to 90%. We also have dental and vision plans in the US where Nisum pays 100% of premiums for employees
  • Volunteerism Pay - We believe in giving back and in the US, our employees are eligible for up to 40 hours of paid time off each year to volunteer towards the causes that they are most passionate about. This is in addition to personal PTO and paid holidays
  • Additional Benefits - We offer all the other important benefits to keep employees and their families healthy and financially secure, such as 401(k) retirement savings with a company match, pre-tax parking and transit programs, disability insurance, and Basic Life/AD&D, alongside exclusive employee discounts on a wide variety of products and services.

Compensation Band:

$120 - $130K per annum

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: RTX153458
  • Position Id: 2026-14841
  • Posted 2 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Full-time

$70 - $80

No location provided

Today

Easy Apply

Full-time, Part-time, Contract, Third Party

No location provided

Today

Full-time

USD 79,200.00 - 178,100.00 per year

No location provided

Today

Full-time

USD 96,800.00 - 223,400.00 per year

Search all similar jobs