Overview
On Site
$60,000 - $80,000
Full Time
Skills
DevOps
SRE
Python & Java
Docker & Kubernetes
Ansible
Automation
Linux
Git & CI/CD
AWS & Azure
DWH
Job Details
Key Skills:
- Technical Proficiency:Strong programming skills (Python, Go, Java, etc.) are crucial for writing automation scripts, building tools, and troubleshooting issues.
- DevOps Knowledge:Understanding of CI/CD pipelines, infrastructure as code (IaC), containerization (Docker, Kubernetes), and cloud platforms is essential.
- SRE Expertise:Familiarity with monitoring systems, incident response procedures, capacity planning, and chaos engineering is vital.
- Problem-solving & Debugging:The ability to diagnose and resolve complex issues quickly and efficiently is crucial.
- Communication & Collaboration:Effective communication with development, operations, and other teams is necessary for incident response and knowledge sharing.
Key Responsibilities:
- Building and Maintaining Reliable Systems:SREs ensure that systems are designed, built, and maintained with reliability in mind, using automation, monitoring, and incident response.
- Incident Response:SREs lead incident response efforts, taking the lead in organizing teams, communicating with stakeholders, and resolving issues.
- Automation:They automate tasks, such as deployments, infrastructure provisioning, and monitoring, to reduce manual work and improve efficiency.
- Monitoring & Observability:SREs implement and maintain monitoring systems to track the health of systems and services, enabling proactive identification and resolution of issues.
- Capacity Planning:SREs plan and manage system capacity to ensure that systems can handle current and future workloads, preventing performance bottlenecks.
- Release Management:SREs oversee the deployment of new features and updates, ensuring that releases are smooth, reliable, and minimize downtime.
- Collaboration & Communication:SREs collaborate with development teams, operations teams, and other stakeholders to ensure that everyone is aligned on goals and priorities.
- Postmortems:SREs participate in postmortems after incidents to identify root causes, learn from failures, and improve processes.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.