Site Reliability Engineer

Overview

On Site
$1 - $2
Full Time
100% Travel

Skills

SRE
Splunk
Extrahop
App Dynamics
Prometheus
Grafana
CPU
Memory
Disk Space utilization
Threads
Connection counts
Java
REST

Job Details

  • Well versed in Application Monitoring tools (Splunk, Extrahop, App Dynamics, Prometheus & Grafana)

  • Good understanding of JVM and Database metrics (CPU, Memory and Disk Space utilization, Threads, Connection counts) with hands-on experience

  • Good understanding of Java webservices (REST Services)

  • Should be able to do analysis of traffic patterns, errors and exceptions from logs suggest improvement ideas

  • Possess strong communication and interpersonal skills to articulate with Senior Management and various stake holders such as App Dev, Infra Teams, DBAs etc.

The resource should be able to work independently to drive all the stability improvements to ensure zero downtime for our webservices and Database.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.