Service Reliability Engineer

distributed systems, aws, public cloud, java, python, linux
Contract W2, 12 Months
$50 - $60
Work from home not available Travel not required

Job Description

Service Reliability Engineering

Role Purpose

  • Support service team as a build member
  • Drive reliability concepts into cloud service teams to keep critical systems operating effectively
  • Write code and build systems to improve performance and operational efficiency of services
  • Assist operations in addressing issues and solving problems

Key Responsibilities

  • Work with Service Development and Service Quality teams to ensure service reliability requirements meet service objectives
  • Author scripts and templates for setup, configuration and monitoring of critical components of service
  • Develop, update, and maintain testing standards and procedures
  • Interfere and troubleshoot any part of supported services when needed
  • Investigate and solve live performance and stability issues in production
  • Provide dedicated support to individual Service Delivery Engineers and Operations
  • Validate scalability testing results, and test limits of hardware and software
  • Oversee all planned outages, and assist with major upgrades to ensure minimum downtime
  • Assist with major upgrades to ensure minimum downtime
  • Educate peers about best standards, processes and technologies
  • Serve as the SME for selecting technology candidates and self-healing capabilities for future service development
  • Perform large scale automation, combining independent processes into robust behavior
  • Participate in follow-the-sun, on-call rotation with team members

Qualifications

  • 5+ years of experience building complex distributed systems
  • 2+ years of experience in managing public cloud-based infrastructure (AWS, GCP or Azure)
  • 3+ years of experience with running and/or managing large infrastructure services with multiple availability regions
  • Public Cloud (AWS, GCP, Azure) Certifications Professional level preferred
  • Detail oriented: able to document and follow detailed instructions within test scripts as well as defects tracking documents (i.e., steps to recreate the problem)
  • Experience with cloud monitoring tools
  • Exceptional communication and troubleshooting skills
  • Fluency in Linux environments (Redhat)
  • Scripting and programming skills (Python, Java, JavaScript)
  • Ability to develop custom tool integrations
  • Ability to write consistent and published APIs
  • Experience building, integrating, deploying and provisioning cloud services
  • Experience with configuration management systems (Chef, Ansible)
  • Experience with modern tools such as Atlassian (Jira, Confluence)
  • Expertise in multiple version control systems (Git, GitHub, BitBucket)
  • Specific experience with Infrastructure as Code (IaC)
  • Experience with performance testing and analysis tools (AppDynamics, Splunk)
  • Understanding of testing concepts and how testing fits in with the overall project life cycle

Posted By

Ashton Ayres

5619 DTC Blvd, Suite 840 Greenwood Village, CO, 80111

Contact
Dice Id : redoak
Position Id : 6230568
Originally Posted : 3 weeks ago
Have a Job? Post it

Similar Positions

Service Reliability Engineer
  • Global Business Consulting Services
  • Dallas, TX
Site Reliability Engineer
  • Meridian Technologies, Inc.
  • Dallas, TX
Reliability Engineer
  • Radiant System, Inc
  • Dallas, TX
Site Reliability Engineer
  • Trade Station Group
  • Richardson, TX
Site Reliability Engineer- Big Data Platform
  • JPMorgan Chase & Co.
  • Plano, TX
DevOps/Site Reliability Engineer
  • Signature Consultants
  • Addison, TX
Service Reliability Engineering ^
  • AVISPA LLC
  • Dallas, TX
SRE Production Support / Management
  • e-IT Professionals Corp.
  • Plano, TX
Sr. Site Reliability Engineer
  • Experis
  • Austin, TX
Site Reliability Engineer
  • Austin Fraser
  • Austin, TX
Senior Site Reliability Engineer
  • Robert Half Technology
  • Austin, TX