Overview
Skills
Job Details
Job Description
SRE Support engineer -Modern IT Operations
- Job Location: Plano, Texas
- Job Duration: Full Time / Hybrid (1-2 days in office or whenever it required)
** NO SPONSORSHIP FROM THE CLIENT **
Job Description
Overview
We are looking for a self-driven, software engineering mindset SRE support engineer enabling an SRE-driven orchestration of all components of the end-to-end ecosystem & preemptively diagnosing anomalies and remediating through automation.
The SRE support engineer is an integral part of the global team with its main purpose to provide a delightful customer experience for the users of the global consumer, commercial, supply chain, and enablement functions in the digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model.
The scope of this role is focused on the Modern architected application portfolio, B2B connect, and DTC, during the North America time zone.
Ensures that DPA applications service performance, reliability, and availability meet the expectations of our customers and internal groups
It requires a blend of technical expertise on SRE tools, modern applications architecture, IT operations experience, and analytics & influence skills.
Responsibilities
- Reporting directly to the SRE & Quality Assurance Sr. Manager, is responsible for enabling & executing the pre-emptive diagnosis of applications towards service performance, reliability, and availability expected by our customers and internal groups
- Responsible as a proactive support engineer, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved.
- Develop / leverage aggregation correlation solutions that integrate events across all ecosystem components of the modern architecture solution and come up with insights to continuously improve the user journey and order flow experience, collaborating with software engineering teams.
- Responsible for designing automation solutions to automate service delivery using techniques such as Flash, Terraform, Ansible, or a custom solution
- Accountable for providing requirements for service enhancements to the software engineering & product management teams
- Be an SRE in designing the event diagnostics, performance measures, and alert solutions to meet the SLA/SLO/SLIs, leveraging AppDynamics, full story, ELK, or custom tools
- Work closely with customer-facing support teams to evolve & empower them with SRE insights
- Participate in on-call support and orchestrating blameless post-mortems, and encourage the practice within the organization
- Provides inputs to the definition, collection, and analysis of data relevant to products, systems, and their interactions towards business process resiliency, especially related to impacting customer satisfaction.
- Revenue or IT productivity
- Actively engage and drive AI Ops adoption across teams
Qualifications
- 8-12 years of work experience evolving to an SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations and ways of working
- Bachelor s degree in Computer Science, Information Technology, or a related field
- The ideal Engineer will be highly quantitative, have great judgment,be able to connect dots across ecosystems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
- The candidate will take a pragmatic approach to resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives.
- A firm understanding of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings proactively resolving incidents, providing a seamless customer/end-user experience, and proactively identifying and mitigating areas of risk
- Proven experience as an SRE in designing the events diagnostics, performance measures, and alert solutions to meet the SLA/SLO/SLIs, Terraform, Octopus, AKS, Python, AppDynamics/Datadog/ELK Stack, Pager Duty, or other AIOps toolsets skillsets.
- Deep hands-on technical expertise, excellent verbal and written communication skills
Differentiating Competencies Required
- Driving for Results: Demonstrates perseverance and resilience in the pursuit of goals. Confronts and works to resolve tough issues. Exhibits a can-do attitude and a willingness to take on significant challenges
- Decision Making: Quickly analyzes complex problems to find actionable, pragmatic solutions. Sees connections in data, events, trends, etc. Consistently works against the right priorities
- Collaborating: Collaborates well with others to deliver results. Keeps others informed so there are no unnecessary surprises. Effectively listens to and understands what other people are saying.
- Communicating and Influencing: Ability to build convincing, persuasive, and logical storyboards. Strong executive presence. Able to communicate effectively and succinctly, both verbally and on paper.
- Motivating and Inspiring Others: Demonstrates a sense of passion, enjoyment, and pride about their work. Demonstrates a positive attitude in the workplace. Embraces and adapts well to change. Creates a work environment that makes work rewarding and enjoyable.