Site Reliability Engineer

Software, HTTP, Engineering, Computer, SQL, Database, Java, Linux, Development, Windows, Performance, Network
Full Time

Job Description

(#jobs)

Remote

Job Responsibilities

• Performs Production SaaS operational and administration duties to maintain the health and reliability of SaaS production systems

• Performs Production SaaS support, incident management, problem management, and service restoration as needed to quickly respond to and resolve production issues

• Implements and trains team members on tools for measuring core product health in production (with opportunities to extend those capabilities all the way back through the entire DevOps pipeline)

• Implements and trains team members for calculating system availability SLAs across Client's products

• Implements and executes the tool consolidation strategy to optimize spend versus value for our end to end monitoring platform

• Implements rapid and continuous development and application of automated solutions to address reliability issues and automate manual tasks

• Works with the Software DevOps team to implement DevOps CICD continuous performance testing, monitoring, and reliability strategy using Visual Studio Team Services and other cloud-based tools

• Implements the measurement capability of core product availability across Azure and Client Cloud using HTTP endpoint testing and synthetic user testing

• Maintain automated site availability reporting and data platform

• Gathers data for usability, reliability, incident, and user experience of AvidXchange products for consumption by executive leadership on a weekly basis

• Influences product delivery teams to implement usability and reliability enhancements leading to improved user experience index scores and improved availability

• Provides detailed analysis and troubleshooting for systems outages providing feedback to product / software engineering

Required Experience, Qualifications, and Skills

• 3+ years of software Engineering, computer science, information technology experience

• Understanding of web hosting infrastructure and architecture in highly available environments

Preferred Experience, Qualifications, and Skills


• Bachelor's degree in Computer Science or Information Technology is preferred

• Relevant Azure cloud computing Certifications strongly preferred

• 3+ years of experience with Dynatrace AppMon, Dynatrace SaaS or competing products

• Experience measuring and monitoring .NET applications, SQL Servers/Database, and Serverless cloud resources or equivalent Java-based experience

• PowerShell or Linux scripting for creating automated routines for ensuring site availability

• Development/coding experience and skills for writing custom automation solutions

• Experience working in an Agile software development environment (Scrum / Kanban)

• Forensic system troubleshooting tools and techniques, including but not limited to:

o Windows Performance Monitor

o Network trace analysis

o Port monitoring

o SysInternals Suite tools like Process Monitor, Process Explorer, etc
Dice Id : 10311390
Position Id : z5G7h3l6a1kMvyS65NP3c-GEpEbMzA5ei1ymPGQZVuU=
Originally Posted : 5 months ago
Have a Job? Post it