SRE Service Availability Analyst

Overview

Hybrid
$50 - $65
Contract - W2
No Travel Required

Skills

Python
Shell
AWS
Azure
GCP
ITIL Foundations Certification
ServiceNow
DevOps
CI/CD

Job Details

Requirements:

  • 5+ years of experience in an information technology environment.
  • 3 years of experience in information technology focused on IT operations that include troubleshooting complex network, server, storage, and/or application issues.
  • Minimum 2 years of experience with operations involving incident, problem, change, and release management that included leading calls and documenting outcomes.
  • Undergraduate degree or equivalent experience/certification.
  • Knowledge of scripting languages (Python, Shell) and familiarity with automation tools (such as Ansible, Jenkins).
  • Experience with cloud platforms (AWS, Azure, Google Cloud Platform), infrastructure as code, and containerization technologies.
  • Experience in incident command or incident management in a technology environment.
  • Strong problem-solving, organizational, and analytical skills.
  • Ability to cover shifts in a 24/7/365 environment and on-call responsibilities.

Nice to have:

  • ITIL Foundations v3+ Certification.
  • Demonstrated experience with ITSM suites, e.g., ServiceNow.
  • Understanding various monitoring, performance, or capacity tools
  • Understanding of continuous integration/continuous deployment (CI/CD) pipelines and DevOps practices.
  • Familiarity with Site Reliability Engineering principles and concepts.
  • Strong leadership qualities, including decisiveness, the ability to motivate teams, and the ability to manage stressful situations calmly and effectively.
  • Ability to create constructive relationships, influence, and communicate with varying levels of associates and management.
  • Ability to solve complex, cross-functional issues.
  • Strong knowledge of server, storage, network, middleware, application, and cloud technologies.
  • A high degree of curiosity and a drive to seek more efficient ways of delivering service.

Responsibilities:

  • Core Technical Work Activities:
  • Serve as Incident Commander during major incidents, leading response efforts to restore services and minimize impact on business and consumer operations.
  • Design and implement automation tools to reduce manual intervention, improve system performance, and prevent incidents.
  • Assess application architectures to identify key monitoring points and performance indicators.
  • Develop and maintain comprehensive monitoring and alerting frameworks to detect and address anomalies before they escalate to incidents.
  • Collaborate closely with development, operations, and support teams for continuous improvement of service reliability and incident response processes.
  • Conduct thorough post-mortems to analyze incidents, identify root causes, and implement preventative measures to avoid recurrence.
  • Effectively communicate incident status, impact, and post-incident reports to stakeholders at all organizational levels.
  • Stay informed on the latest industry trends, technologies, and practices in site reliability engineering and incident management.
  • Delivering on the Needs of Key Stakeholders:
  • Understand and meet the needs of key stakeholders.
  • Develop specific goals and plans to prioritize, organize, and accomplish work.
  • Collaborate with internal partners and stakeholders to support business/initiative strategies.
  • Communicate concepts clearly and persuasively, which is easy to understand.
  • Generate and provide accurate and timely results in the form of reports, presentations, etc.
  • Support achievement of performance goals, budget goals, team goals, etc.
  • Other:
  • Maintain a high performance level under pressure or when facing changes or challenges in the workplace.
  • Convey information and ideas to others convincingly and engagingly through a variety of methods.
  • Identify and understand issues, problems, and opportunities; obtain and compare information from different sources to conclude, develop, and evaluate alternatives and solutions, solve problems, and choose a course of action.
  • Exhibit behavioral styles that convey confidence and command respect from others, making a good first impression and representing the company in alignment with its values.
  • Develop business plans by exploring and systematically evaluating opportunities with the most significant potential for producing positive results. Ensure the successful preparation and execution of business plans through effective planning, organization, and ongoing evaluation processes.
  • Participate as a team member to contribute to the achievement of common goals while fostering cohesion and collaboration among team members.
  • Ensures successful execution of business plans designed to maximize customer satisfaction, profitability, and market share through effective planning, organizing, and ongoing evaluation processes.
  • Set high standards of performance for self and/or others; assume responsibility for work objectives; initiate, focus, and monitor the efforts of self and/or others toward the accomplishment of goals; proactively take action and go beyond what is required.
  • Gathers information and resources required to set a plan of action for self and/or others; prioritizes and arranges work requirements to accomplish goals and ensure work is completed.
  • Develop and sustain relationships based on an understanding of customer/stakeholder needs and actions consistent with the company s service standards.
  • Interact with others in a way that fosters openness, trust, and confidence, thereby promoting the pursuit of organizational goals and the development of lasting relationships.
  • Support employees and business partners with diverse styles, abilities, motivations, and/or cultural perspectives; utilizes differences to drive innovation, engagement, and enhance business results; and ensures employees are allowed to contribute to their full potential.
  • Evaluate and adapt the structure of assignments and work processes to best fit the needs and/or support the goals of an organizational unit.
  • Provide support and feedback to help individuals develop and strengthen the skills and abilities necessary to achieve work objectives.
  • Seek and make the most of learning opportunities to improve the performance of self and/or others.
  • Understand and utilize business information to manage everyday operations and generate innovative solutions to approach business and administrative challenges.
  • Understand and utilize professional skills and knowledge in a specific functional area to conduct and manage everyday business operations and generate innovative solutions to approach function-specific work challenges.
  • Uses basic computer hardware and software (e.g., personal computers, word processing software, Internet browsers, etc.).
  • Add, subtract, multiply, or divide quickly, correctly, and in a way that allows one to solve work-related issues.
  • Listen and understand information and ideas presented through spoken words and sentences.
  • Understand written sentences and paragraphs in work-related documents
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Svitla Systems Inc.