JOB DUTIES
Partner with the architecture and development teams on how to make applications highly available,
reliable, and performant at a global scale
Collaborate with the architecture team to ensure Reliability factors are accounted for in business
features and enablers
Guide development teams in understanding established service level objectives and consequences,
and implementing appropriate SLIs to support the objectives.
Collaborate with development team members to swarm, troubleshoot, and resolve problems.
Guide ad-hoc teams to brainstorm solutions and build implementation plans based on the Root Cause
Analysis of production issues
Design and build automated solutions to optimize application/service/platform uptime with minimal
human intervention
Be available for an on-call rotation to participate in troubleshooting and communication efforts
outside of normal business hours
Implement and help create standards and best practices, and mentor other team members in order
to drive adoption across development teams
Perform other duties as assigned
Conform with all company policies and procedures
JOB SPECIFICATION
Expert in defining, implementing, and evaluating Service Level Objectives (SLO) and Service Level
Indicators (SLI), and associated consequences
Software development expertise in two or more high-level programming and scripting languages
Experience in evolutionary database design, query performance analysis, and indexing as a
cornerstone for delivering scalable, performant products and services
Experience in designing, building, and optimizing automated pipelines with automated testing and
automated security controls
Experience in performing Root Cause Analysis and Problem Management
Experience working in Agile Scrum teams with demonstrated success leading improvements (getting
better/faster/happier)