Site Reliability Engineer with Observability & Production Support

Overview

On Site
$DOE
Full Time
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 15 day((s))
10% Travel

Skills

Java
Python
Dynatrace
JD
Tier 2
Communication
Splunk
AppDynamics
Grafana
Virtual Machines
Firewall
API
Database
Network
Linux
Unix
Docker
Kubernetes
Amazon Web Services
Google Cloud
Google Cloud Platform
ServiceNow
Software Performance Management
NMON
Wireshark
Production Support
AIM
Regulatory Compliance
Dashboard

Job Details

Job Title: Site Reliability Engineer with Observability & Production Support

Location: Bellevue, WA (Onsite)

Duration: 4+ Years

Shift Time: Flexibility to work in 24 X 7 environment

Business travel required (Yes / No): Yes

Total Yrs. of Experience 10+ Years Relevant Yrs. of experience 8+ Years

Mandatory Skills

SRE, Observability Tools and Production Support

Desired Skills

Knowledge on Java, Python, Go, Node etc

Any Certification (Mandatory)

Certified on one or more observability tools like Splunk. AppDynamics, Grafana, Dynatrace etc.

Detailed JD (Roles and Responsibilities)

Skills

  • SRE Mindset in Production support: Proactive issue identification using observability tools. Skills in using different monitoring & observability tools to track system performance
  • Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.
  • Communication: Excellent communicator who could interact with Director/Sr. Director and above.

Technical expertise

  • Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes
  • Knowledge of VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix
  • Knowledge of Containerization, Docker, Kubernetes, AWS, PCF, Google Cloud Platform
  • ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)
  • APM, NMON, Wireshark usage and analysis
  • Experience in UEM and synthetic monitoring tools

Responsibilities

  • Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR
  • Coordinate all activities required to lead incident triage in compliance with SLAs and OLAs. Corelating inputs from various dashboards & tools to drive resolution.
  • Flexibility to work in 24 X 7 environment
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.