Lead Site Reliability Engineer

Overview

Remote
$60 - $70
Contract - Independent
Contract - W2
Contract - 1 Year(s)

Skills

DevOps
Python
Reliability Engineering
Typescript
Powershell
Javascript
configuration management
Puppet
web development
system analysis

Job Details

We have an opening for a Sr./Lead Site Reliability Engineer in Dallas, TX. Start date is ASAP and will last 1 year+
Comments: POSITION IS REMOTE - THEY WILL TRAVEL ONCE EVERY 3 MONTHS TO CLIENT SITE - PREFER LOCAL TO DALLAS
Rate: 60-70/hr plus expenses

Skills (NONE/ADVANCED/EXPERT):

  • SRE Lead
  • Able to help build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams.
  • Able to apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
  • Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment/on-premises environment and handle live production incidents, debug/troubleshoot application, and infrastructure issues, follow and implement SRE best practices.
  • Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation;

Description:

The Senior Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment.

What you will do:
? Help build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams.
? Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
? Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment/on-premises environment and handle live production incidents, debug/troubleshoot application, and infrastructure issues, follow and implement SRE best practices.
? Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation;
? Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability.
? Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency.
? Work closely with software engineers and QAs to ensure the system is responding properly to no-functional requirements such as performance, security, and availability.
? Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it.
? Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure.
? Keep up-to-date with security and proactively identify, diagnose, and solve complex security issues.

What we re looking for:
? Bachelor s Degree in Computer Science or related; or equivalent combination of education and experience
? 5+ yrs experience in full-stack application support/DevOps/SRE role
? Experience in Javascript, Typescript and web development technologies
? Proficient in scripting languages such as Powershell and/or Python
? Troubleshooting utilizing built-in browser tools
? Ability to distill technical and complex principles or scenarios to all levels of our organization
? Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), automation and configuration tools (Puppet, Ancible, etc) a plus.
? Knowledge of public clouds (Google Cloud Platform, AWS, Azure) inclusive of implementing projects on public clouds a plus.
? Ability to self-govern workload and show discipline around priority and time management, even while working remotely or in the absence of direct management for an extended period
? Ability and willingness to adapt to new application stacks and new technology concepts as the business evolves over time
? Excellent communication skills, both verbal and written
? Ability to collaborate with local and remote teams in different time zones
? Ability to present/lead technical discussions.