Site Reliability Engineer (SRE)

Overview

Hybrid

$60 - $65

Accepts corp to corp applications

Contract - W2

Contract - Independent

Contract - 12 Month(s)

Skills

Amazon Web Services

Computer Networking

Analytical Skill

Ansible

Bash

Cloud Computing

Incident Management

Docker

Documentation

Google Cloud

Google Cloud Platform

Collaboration

Computer Science

Configuration Management

Microsoft Azure

Conflict Resolution

Kubernetes

Root Cause Analysis

Scripting

System Security

Linux

Linux Administration

Management

Orchestration

Problem Solving

Python

Systems Architecture

Terraform

Virtual Machines

Grafana

Job Details

Site Reliability Engineer (SRE) || Fountain Valley, CA

Job Summary:

We are looking for a Site Reliability Engineer (SRE) to help keep our systems stable, fast, and secure.

Key Responsibilities:

Build and maintain reliable systems that scale with business needs across multiple Linux/VM environments.
Automate routine tasks and deployment processes to enhance efficiency and reduce manual intervention.
Manage security aspects, including regular updates, patching, and key rotation in Linux environments.
Monitor infrastructure and services proactively to detect and resolve issues before they impact users.
Collaborate with development teams to define and maintain SLOs, SLIs, and performance benchmarks.
Lead incident response efforts, drive root cause analysis, and ensure corrective actions are implemented.
Improve alerting systems to reduce noise and ensure alerts are meaningful and actionable.
Ensure systems comply with internal security and operational standards.
Maintain detailed documentation including system architecture, standard procedures, and troubleshooting steps.

Qualifications:

Bachelor s degree in computer science, Engineering, or equivalent practical experience.
Strong experience with Linux systems administration and troubleshooting.
Hands-on experience with cloud platforms: AWS, Azure, or Google Cloud.
Proficiency in scripting languages: Python, Bash, or Go.
Familiarity with automation and configuration management tools like Terraform and Ansible.
Experience with containers and orchestration: Docker and Kubernetes.
Working knowledge of monitoring and logging tools such as Prometheus, Grafana, and the ELK stack.
Solid understanding of networking, system security, and operational best practices.
Strong analytical and problem-solving skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share