Overview
Skills
Job Details
Job Title: Site Reliability Engineer
Location: 100% REMOTE (PST work hours)
Duration: 6-12+ months
Pay Rate : $60 - $65 /hr
Job Description
You will play a pivotal role in ensuring the reliability, scalability, and performance of our mission-critical cloud systems. You will be responsible for building and maintaining a robust platform that supports the growth and success of our organization.
Responsibilities
- Design, implement, and maintain highly available, scalable, and resilient cloud infrastructure.
- Troubleshoot complex technical issues across the entire software stack.
- Collaborate with development teams to ensure smooth deployment and operation of new applications and services.
- Automate routine tasks and processes to improve efficiency and reliability.
- Stay up to date with emerging technologies and best practices in cloud computing and SRE.
Qualifications
- Experience using Go or Python is a must. Knowing Perl or Ruby also is a nice to have.
- Strong linux systems experience. With proficiency in Linux operating systems (RHEL 9) and their variants.
- Strong understanding of networking, storage, and security concepts.
- Experience with Markdown for repository documentation on GitHub.
- Experience with Puppet
- Experience troubleshooting baremetal hosts.
- Experience supporting Java based applications.
- Ability to document complex system changes, troubleshooting steps, and incident post-mortems in a clear and concise manner, facilitating streamlined knowledge sharing and future problem resolution.
- Able to demonstrate a strong autonomy in managing critical SRE tasks and projects, consistently delivering results with minimal oversight.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team.
Nice to Have
- Knowledge of container orchestration tools like Docker, etc.
- Experience in Perl or Ruby.