Overview
On Site
$60,000 - $80,000
Full Time
Skills
AWS
DevOps
EC2 & RDS
Cloud Watch
Iac
CI/CD & GitLab CI
Docker & Kubernetes
Python & Bash
IAM & Security
IP Networking
Linux/Unix
SRE
SLIs & SLOs
Incident Management
Automation & Scripting
Job Details
Technical Skills:
- AWS Cloud Expertise:Deep understanding and hands-on experience with core AWS services like EC2, S3, RDS, Lambda, CloudFormation, CloudWatch, Route 53, Auto Scaling, and more.
- Infrastructure as Code (IaC):Proficiency in tools like Terraform, AWS CloudFormation, or similar for automated infrastructure provisioning and management.
- CI/CD Pipelines:Experience with tools like Jenkins, GitLab CI, AWS CodePipeline, or similar for automating the software delivery process.
- Containerization and Orchestration:Familiarity with Docker, Kubernetes, or other containerization and orchestration technologies.
- Monitoring and Observability:Experience with tools like AWS CloudWatch, Prometheus, Grafana, Datadog, or similar for monitoring system health, performance, and resource utilization.
- Scripting and Automation:Strong scripting skills in Python, Bash, or other languages for automating tasks and infrastructure management.
- Security:Understanding of AWS security best practices, including IAM, Security Groups, NACLs, and encryption.
- Networking:Knowledge of IP networking, VPNs, DNS, load balancing, and firewall concepts.
- Linux/Unix System Administration:Experience with Linux/Unix systems for managing servers and troubleshooting issues.
SRE-Specific Skills:
- Reliability Engineering:Understanding of concepts like SLIs, SLOs, error budgets, and designing for high availability, disaster recovery, and fault tolerance.
- Incident Management:Experience in diagnosing and resolving production issues, including incident response, root cause analysis, and post-incident reviews.
- Monitoring and Alerting:Ability to set up comprehensive monitoring and alerting systems using tools like CloudWatch, Prometheus, or third-party solutions.
- Metrics and Logging:Ability to define, collect, and analyze metrics to understand system behavior and identify potential problems.
- Automation and Scripting:SREs heavily rely on automation and scripting to manage and maintain systems, especially in a cloud environment.
- Collaboration and Communication:Effective communication and collaboration skills are essential for working with development, operations, and other teams.
Soft Skills:
- Problem-Solving:Ability to analyze complex issues, identify root causes, and implement effective solutions.
- Communication:Clear and concise communication skills, both written and verbal, for collaborating with teams and documenting issues and solutions.
- Continuous Learning:Staying up-to-date with the latest AWS services, SRE best practices, and DevOps trends.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.