Incident Manager (SRE / Operations)

Philadelphia, PA, US • Posted 12 hours ago • Updated 12 hours ago
Contract W2
Contract Independent
No Travel Required
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • Incident Manager
  • SRE
  • operations engineering
  • incident command

Summary

Job Title: Incident Manager (SRE / Operations)

Location: Philadelphia, PA (100% Onsite – Day 1)
Duration: 12+ Months
Open Positions: 14


⚠️ Critical Notes:

  • 100% Onsite from Day 1 (Philadelphia, PA)
  • Immediate hiring – bulk positions (14 openings)
  • Virtual interview drive scheduled soon – fast turnaround required

Job Summary:

We are seeking experienced Incident Managers with strong expertise in SRE, operations engineering, and incident command. The ideal candidate will lead high-impact incident response, ensure system reliability, and drive cross-functional coordination during outages and large-scale system events.


Key Responsibilities:

  • Lead incident command and management for critical production issues
  • Coordinate cross-functional teams during high-severity incidents
  • Drive root cause analysis (RCA) and implement preventive measures
  • Manage system reliability and operational stability
  • Collaborate with SRE, DevOps, and engineering teams
  • Ensure effective communication with stakeholders and leadership
  • Drive automation and observability improvements
  • Handle large-scale change events and system outages
  • Maintain incident reports, documentation, and post-mortem analysis
  • Continuously improve incident response processes and frameworks

Required Skills & Experience:

  • 6–8 years of experience in:
    • Incident Management / Production Support / SRE roles
  • Strong expertise in:
    • Incident Command & Crisis Management
    • Site Reliability Engineering (SRE)
    • Operations Engineering
  • Strong knowledge of:
    • Reliability architecture and system design
    • Automation and observability tools
  • Proven ability to:
    • Lead teams during high-impact outages
    • Drive systemic problem resolution
  • Excellent executive communication and stakeholder management skills

Technical Skills:

  • Incident Management
  • SRE / Operations Engineering
  • Monitoring & Observability Tools
  • Automation & Reliability Engineering

Preferred Qualifications:

  • Experience in enterprise-scale production environments
  • Strong analytical and problem-solving skills
  • Ability to work in high-pressure, fast-paced environments

Key Deliverables:

  • Rapid and effective incident resolution
  • Improved system reliability and uptime
  • Well-documented RCA and post-incident reports
  • Strong coordination across technical and business teams
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10488618
  • Position Id: 8959374
  • Posted 12 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Philadelphia, Pennsylvania

Today

Easy Apply

Contract

Depends on Experience

Philadelphia, Pennsylvania

Yesterday

Easy Apply

Contract, Third Party

Depends on Experience

Mount Laurel Township, New Jersey

Today

Easy Apply

Full-time

$100,000 - $110,000

West Chester, Pennsylvania

Yesterday

Easy Apply

Contract

Depends on Experience

Search all similar jobs