Overview
Remote
$60 - $65
Contract - W2
Contract - 12 Month(s)
Skills
SRE
Bash
Python
Ansible
automation
Grafana
Splunk
blue
green
Job Details
Site Reliability Engineer (SRE) - Remote
Our client, a leading federal defense contractor is seeking a Site Reliability Engineer (SRE) responsible for maintaining survivability and reliability of mission critical resources.
The SRE will monitor high priority systems and automate recovery mechanisms to ensure they remain operational for the warfighter.
Responsibilities:
- Ensuring Uptime of Critical Systems (Incident Response / Triage)
- Monitor, and Troubleshoot Enterprise Services (Prometheus, Grafana, Splunk)
- Configure Enterprise Services (Ansible, YAML, JSON)
- Requires a Bachelor's degree in a STEM field and 5+ years of job-related experience, or a Master's degree plus 3 years of job-related experience.
- Experience monitoring large scale systems and using automation to triage emerging issues
- Experience with Prometheus (preferred) and/or Grafana and Splunk.
- Experience automating Systems Administration Activities (Bash / Python / Ansible are preferred)
- Experience developing recovery procedures for large systems (Backup and Restore, Blue/Green Deployment)
- Linux experience
- Collaborative team player with experience working on teams with diverse engineering skills
- Mixed job experience involving software engineering, systems administration, and network engineering
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.