Overview
Hybrid
$60 - $65
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 12 Month(s)
Skills
Amazon Web Services
Computer Networking
Analytical Skill
Ansible
Bash
Cloud Computing
Incident Management
Docker
Documentation
Google Cloud
Google Cloud Platform
Collaboration
Computer Science
Configuration Management
Microsoft Azure
Conflict Resolution
Kubernetes
Root Cause Analysis
Scripting
System Security
Linux
Linux Administration
Management
Orchestration
Problem Solving
Python
Systems Architecture
Terraform
Virtual Machines
Grafana
Job Details
Site Reliability Engineer (SRE) || Fountain Valley, CA
Job Summary:
We are looking for a Site Reliability Engineer (SRE) to help keep our systems stable, fast, and secure.
Key Responsibilities:
- Build and maintain reliable systems that scale with business needs across multiple Linux/VM environments.
- Automate routine tasks and deployment processes to enhance efficiency and reduce manual intervention.
- Manage security aspects, including regular updates, patching, and key rotation in Linux environments.
- Monitor infrastructure and services proactively to detect and resolve issues before they impact users.
- Collaborate with development teams to define and maintain SLOs, SLIs, and performance benchmarks.
- Lead incident response efforts, drive root cause analysis, and ensure corrective actions are implemented.
- Improve alerting systems to reduce noise and ensure alerts are meaningful and actionable.
- Ensure systems comply with internal security and operational standards.
- Maintain detailed documentation including system architecture, standard procedures, and troubleshooting steps.
Qualifications:
- Bachelor s degree in computer science, Engineering, or equivalent practical experience.
- Strong experience with Linux systems administration and troubleshooting.
- Hands-on experience with cloud platforms: AWS, Azure, or Google Cloud.
- Proficiency in scripting languages: Python, Bash, or Go.
- Familiarity with automation and configuration management tools like Terraform and Ansible.
- Experience with containers and orchestration: Docker and Kubernetes.
- Working knowledge of monitoring and logging tools such as Prometheus, Grafana, and the ELK stack.
- Solid understanding of networking, system security, and operational best practices.
- Strong analytical and problem-solving skills.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.