AWS Cloud DevOps Incident Manager

    • Peraton
  • Posted 26 days ago | Updated 26 days ago

Overview

USD 86,000.00 - 138,000.00 per year
Full Time

Skills

Incident management
DevOps
Continuous improvement
Operational excellence
Software development
Emerging technologies
Reliability engineering
Cloud computing
Problem solving
Critical thinking
Multitasking
Continuous integration
Continuous delivery
Amazon Web Services
Leadership
Nexus
Adobe AIR
SAFE
Management
IMPACT
Collaboration
Metrics
Communication
ROOT
Documentation
Dashboard
Mentorship
Business analytics
Mergers and acquisitions
C
Security clearance
SCA
Insurance
Financing

Job Details

About Peraton
Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world's leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can't be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we're keeping people around the world safe and secure.
Responsibilities

We are looking for an AWS Cloud DevOps Incident Manager. This hired individual plays a critical role in ensuring the reliability and availability of our software systems and services by effectively managing and responding to incidents. This individual will lead the incident response process, coordinating cross-functional teams, implementing incident management best practices, and driving continuous improvements to minimize future incidents.

What you will do:

Lead and coordinate the end-to-end incident management process, from detection and diagnosis to resolution and post-incident analysis.
Establish and enforce incident response procedures, ensuring that teams follow established protocols to minimize downtime and impact on users.
Collaborate with development, operations, and support teams to ensure a unified and coordinated response to incidents.
Monitor system health and performance metrics to proactively identify potential incidents and address them before they escalate.
Act as the point of contact during high-severity incidents, keeping stakeholders informed and managing communication to internal and external parties.
Conduct post-incident reviews to identify root causes, contributing factors, and areas for improvement. Implement corrective actions to prevent similar incidents from occurring.
Drive continuous improvement by analyzing incident trends, identifying recurring issues, and working with teams to implement solutions.
Develop and maintain documentation related to incident response procedures, including runbooks, escalation paths, and communication guidelines.
Create dashboards and reports to provide insights into operational performance and health.
Provide mentoring and guidance to team members to enhance incident response skills and overall operational excellence.
Collaborate with engineering teams to ensure that incident learnings are integrated into the software development lifecycle to improve overall system resilience.
Stay up-to-date with industry best practices, emerging technologies, and trends related to incident management and reliability engineering.

Qualifications

Required Qualifications:

Minimum of 8 years with BS/BA; Minimum of 6 years with MS/MA; Minimum of 3 years with PhD. Additional years of experience maybe accepted in lieu of the degree.
Proven experience in incident management or a related role within a DevOps or SRE (Site Reliability Engineering) environment.
Strong understanding of software development, infrastructure, and AWS cloud technologies.
Familiarity with incident management tools and systems, such as incident tracking software and monitoring platforms.
Excellent problem-solving and critical-thinking skills, with the ability to handle high-pressure situations calmly and methodically.
Excellent communication and interpersonal skills, including the ability to lead cross-functional teams and communicate effectively with technical and non-technical stakeholders.
A strong ability to learn new technologies combined with a strong ability to coordinate activities in an interrelated and highly visible manner.
Must be able to multi-task and work well with changing priorities in a fast paced, 24x7 environment.
Experience with continuous integration and continuous delivery (CI/CD) pipelines is a plus.
Relevant certifications in incident management, DevOps, or related areas are desirable.
Ability to obtain and maintain a High Risk Public Trust 6C is required.
Preferred Qualifications:

High Risk Public Trust or Secret Clearance preferred.
Benefits:

At Peraton, our benefits are designed to help keep you at your best beyond the work you do with us daily. We're fully committed to the growth of our employees. From fully comprehensive medical plans to tuition reimbursement, tuition assistance, and fertility treatment, we are there to support you all the way.

Target Salary Range

$86,000 - $138,000. This represents the typical salary range for this position based on experience and other factors.

SCA / Union / Intern Rate or Range

EEO
An Equal Opportunity Employer including Disability/Veteran.

Our Values

Benefits
At Peraton, our benefits are designed to help keep you at your best beyond the work you do with us daily. We're fully committed to the growth of our employees. From fully comprehensive medical plans to tuition reimbursement, tuition assistance, and fertility treatment, we are there to support you all the way.

Paid Time-Off and Holidays
Retirement
Life & Disability Insurance
Career Development
Tuition Assistance and Student Loan Financing
Paid Parental Leave
Additional Benefits
Medical, Dental, & Vision Care