Site Reliability Engineer

Overview

On Site
$100,000 - $140,000
Full Time

Skills

Dashboard
Docker
Management
Linux
AppDynamics
Kubernetes
Amazon Web Services
Production Support
Tier 2
Splunk

Job Details

Title: Site Reliability Engineer Location: Overland Park, Kansas / Bellevue,Washington
Job Description

At least 4 years of Information Technology experience.

SRE Mindset in Production support: Proactive issue identification using observability tools.

Skilled in using different monitoring & observability tools to track system performance

Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.

Experience in Splunk (including Splunk APM and Splunk O11y), AppDynamics,

Experience in DB, Network, Linux / Unix, Kubernetes

Experience in APM, NMON , Wireshark usage and analysis


Preferred Qualification:

Production support expertise with SRE Observability experience :

Proactive issue identification using observability tools.

Skills in using different monitoring & observability tools to track system performance

Production support activities including proactive identification of issues leveraging observability tools, Corelating inputs from various dashboards & tools to drive resolution

Experience in swiftly identifying probable failure points through the analysis of multiple inputs from the logs, observability dashboards, recent application changes, infra, network changes etc.

Basic level of trouble shooting on every layer of the tech stack (Application, Database, infra (Container platforms) and Network )

Experience in setting up observability dashboards based on Splunk logs

Communication :

Excellent communicator. They are also expected to actively lead and triage proactively identified issues/incidents where VPs/SVPs are also present in this call.

Leadership in triage calls - direct the teams for actions to be taken on the call

Automation :

Experience in Toil identification and automation

Technical expertise:

Analysis of issues via Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes

Debugging of issues in VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix

Debugging of issues in Containerization, Docker, Kubernetes, AWS, PCF, Azure

Analysis of issues via APM, NMON , Wireshark usage and analysis

Database performance monitoring and analysis

Experience in UEM and synthetic monitoring set up

Experience in heap dump analysis, memory leak analysis and resource optimization

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.