Service Reliability Engineer

distributed systems, aws, public cloud, java, python, linux
Contract W2, 12 Months
$50 - $60
Work from home not available Travel not required

Job Description

Service Reliability Engineering

Role Purpose

  • Support service team as a build member
  • Drive reliability concepts into cloud service teams to keep critical systems operating effectively
  • Write code and build systems to improve performance and operational efficiency of services
  • Assist operations in addressing issues and solving problems

Key Responsibilities

  • Work with Service Development and Service Quality teams to ensure service reliability requirements meet service objectives
  • Author scripts and templates for setup, configuration and monitoring of critical components of service
  • Develop, update, and maintain testing standards and procedures
  • Interfere and troubleshoot any part of supported services when needed
  • Investigate and solve live performance and stability issues in production
  • Provide dedicated support to individual Service Delivery Engineers and Operations
  • Validate scalability testing results, and test limits of hardware and software
  • Oversee all planned outages, and assist with major upgrades to ensure minimum downtime
  • Assist with major upgrades to ensure minimum downtime
  • Educate peers about best standards, processes and technologies
  • Serve as the SME for selecting technology candidates and self-healing capabilities for future service development
  • Perform large scale automation, combining independent processes into robust behavior
  • Participate in follow-the-sun, on-call rotation with team members


  • 5+ years of experience building complex distributed systems
  • 2+ years of experience in managing public cloud-based infrastructure (AWS, GCP or Azure)
  • 3+ years of experience with running and/or managing large infrastructure services with multiple availability regions
  • Public Cloud (AWS, GCP, Azure) Certifications Professional level preferred
  • Detail oriented: able to document and follow detailed instructions within test scripts as well as defects tracking documents (i.e., steps to recreate the problem)
  • Experience with cloud monitoring tools
  • Exceptional communication and troubleshooting skills
  • Fluency in Linux environments (Redhat)
  • Scripting and programming skills (Python, Java, JavaScript)
  • Ability to develop custom tool integrations
  • Ability to write consistent and published APIs
  • Experience building, integrating, deploying and provisioning cloud services
  • Experience with configuration management systems (Chef, Ansible)
  • Experience with modern tools such as Atlassian (Jira, Confluence)
  • Expertise in multiple version control systems (Git, GitHub, BitBucket)
  • Specific experience with Infrastructure as Code (IaC)
  • Experience with performance testing and analysis tools (AppDynamics, Splunk)
  • Understanding of testing concepts and how testing fits in with the overall project life cycle

Posted By

Ashton Ayres

5619 DTC Blvd, Suite 840 Greenwood Village, CO, 80111

Dice Id : redoak
Position Id : 6230568
Originally Posted : 3 weeks ago
Have a Job? Post it

Similar Positions

Service Reliability Engineer
  • Global Business Consulting Services
  • Dallas, TX
Site Reliability Engineer
  • Meridian Technologies, Inc.
  • Dallas, TX
Reliability Engineer
  • Radiant System, Inc
  • Dallas, TX
Site Reliability Engineer
  • Trade Station Group
  • Richardson, TX
Site Reliability Engineer- Big Data Platform
  • JPMorgan Chase & Co.
  • Plano, TX
DevOps/Site Reliability Engineer
  • Signature Consultants
  • Addison, TX
Service Reliability Engineering ^
  • Dallas, TX
SRE Production Support / Management
  • e-IT Professionals Corp.
  • Plano, TX
Sr. Site Reliability Engineer
  • Experis
  • Austin, TX
Site Reliability Engineer
  • Austin Fraser
  • Austin, TX
Senior Site Reliability Engineer
  • Robert Half Technology
  • Austin, TX