Major Incident Manager

$50-55/Hr W2

Contract: W2, 6 Month(s)

  • Work from home

Job Description


  • This role demands to be available 24*7 for business-critical incidents, work in shifts.
  • Must be agreeable to work a flexible schedule to meet the needs of the business, including holiday, evening, overnight and weekend shifts
  • As an incident commander you will be responsible for driving high-severity, high-visibility severity calls to closure within the defined SLAs (Service Level Agreements). Based on the severity of the issue, which may be revenue impacting, you will be required to engage with executives, vendors, SREs (Site Reliability Engineering) (Site Reliability Engineering), and other product teams during the call.
  • Train team members and other incident commanders on how to drive major incident calls, by providing feedback during and after the calls.
  • As a deputy you will be responsible for monitoring and assessing potential impact to business-critical applications/systems and assist incident commander as required. This role will be expected to take over incident commander role when needed.
  • As a scribe you will be responsible for maintaining timeline of key events during a major incident. Documenting actions and keeping track of any follow up items that will need to be addressed.
  • Monitor operational dashboards and alert the Product teams for action. Drive urgency based on application criticality.
  • Assesses risk and manages activities affecting the production environment and users facing application availability.
  • Understanding of reactive case lifecycle and troubleshooting methodology.
  • Help drive the required changes for reliability as the company adopts a hybrid cloud foundation


  • A Bachelor's degree in Business, Economics, Mathematics, Information Systems, Computer Science, or equivalent degree or work experience and 6+ years of professional experience in IT (Information Technology) Operations/Incident Management
  • ITIL Certified or equivalent
  • Proficiency with ITSM tools like Service Now, Pager Duty, ITIL, IT Operations and Incident management
  • Confidence and presence to interact with executives and cross team stakeholders
  • Ability to persuade and influence teams to drive solutions to business-critical issues
  • Exposure in handling and engaging vendors and vendor management
  • Exposure to incident management process for cloud hosted applications environment on Azure and Oracle
  • Excellent verbal and written communications and the ability to work well with business and technical teams
  • Event response experience with one or more New Relic, Nagios, Splunk, Grafana, Datadog
  • System-level understanding of storage, computing, distributed systems, networking.
  • Understanding of Java, J2EE, Spring Boot, microservice architecture, Middleware & Tomcat administration
  • System-level understanding of storage, computing, distributed systems, networking
  • Good knowledge on Azure Cloud Fundamentals
  • Ability to manage multiple activities at one time in a high-pressure environment
  • Strength in building partnerships; working collaboratively with a variety of skills and levels
  • Strong problem solving/analytical skills and strength in driving for business outcomes and results
  • Strong technical, written, analytical and verbal communication skills
  • Proven ability to prioritize, multi-task and adaptable to changing requirements and direction
  • Must be technically literate and be able to articulate technical issues in a meaningful way to both engineers and executive level management.
  • Must have good understanding of eCom retail architecture design and platforms hosted with Cloud Service Providers such as MSFT, Oracle, Google, etc.