Sr System Reliability Engineer (SRE)

Overview

On Site
$20 - $50
Contract - W2
Contract - 12 Month(s)

Skills

Agile
Amazon Web Services
Python
ROOT
Recovery
Reliability Engineering

Job Details

Sr System Reliability Engineer (SRE)

End client: T-Mobile

Onsite at Overland Park, KS

Job Description:

The Sr System Reliability Engineer (SRE) improves and protects the software and systems behind all of T-Mobile s IT services, including management of scalability, availability, latency, performance, security, and capacity, and delivering of software faster, better, and cheaper. From designing & maintaining Continuous Integration Continuous Delivery Pipelines to building the next generation of T-Mobile applications, the SRE s enable great customer experience and product innovation by continuous improvement of operational support.

Responsibilities:

Regular working hours Monday Friday is 7 AM PT to 4 PM PT
Occasional on-call support will be required.
Coordinate support with offshore teams (i.e. on call hand off). Protect and ensure the stability of the operational systems as well as maintain the integrity of the data they contain.
Plan, coordinate, and execute configuration changes and code deployments to the supported systems.
Assist in determining the impact of operational issues to our customers, both internal and external, and provide input into their resolution via data extraction and quantification.
Coordinate and prioritize all escalated activities, implementation of system upgrades, system enhancements, and production outages.
Assess critical path and assist in implementation of any project required by the business.
Participate in complex troubleshooting efforts requiring multiple teams and disciplines for recovery and root cause investigation.
Demonstrate fluency in emerging DevOps-centric automation tools and technologies for CICD, configuration management, etc. for prod and non-prod environments.
Perform environment management for Cisco UCCE and Verbio ASR systems as well as containerized applications and microservices in a custom Kubernetes environment
Research tools to provide improved observability in Speech applications
Deliver software to improve the availability, scalability, latency, resilient (Active/active, active/passive) and efficiency of T-Mobile s services with 99.99% availability.
Create, manage, and use dashboards for continuous monitoring and health check of applications and the underlying infrastructure, improve the quality of services using the monitoring feedback for production and non-production environments.
Contribute to the future improvement of software delivery processes and operations, e.g., cloud enablement and use of microservices with containerization.

Required Qualifications:
5 years of overall experience as SRE engineer or a similar role
Working experience in one or more of: Java, Spring boot, Go, or scripting experience in Shell, Perl, or Python.
2 years of experience in gitlab Trunk based development, environment automation and management, docker, Helm, Config Maps, Config Secret Handling & encryption, memory, CPU management and application tuning in k8s or other containerized environments.
Working knowledge of Proxy's and Gateways, Token implementation and management for authentication via proxies, load balancing principles with application proxies and other Software Load Balancer (SLB) and Local Traffic Manager (LTM) solutions.
Working knowledge of container orchestration solutions such as Kubernetes, AWS, Azure, etc..
Working knowledge and understanding of the principles of Infrastructure as a Service (IAAS), Platform as a Service (PAAS), and Container as a Service (CAAS) solutions
Knowledge of Application Performance Monitoring (APM) tools like AppDynamics; logging, analyzing, and writing complex queries in Splunk; setting observability in Grafana, Prometheus, AppD, etc
Experience working in Agile and DevOps environments

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Care IT Services Inc