Overview
Skills
Job Details
Site Reliability Engineer/Cloud Engineer
Location:- US Remote
Job Summary
The Support Lead (SRE) is responsible for overseeing the support operations and site reliability engineering tasks, ensuring the effective functioning of systems and applications. The primary goal is to enhance system performance, availability, and resiliency. (1.) Key Responsibilities
1. Manage a team of support engineers and sres to provide technical support and address system issues promptly.
2. Monitor system performance and reliability metrics, identifying areas for improvement and implementing solutions.
3. Collaborate with cross functional teams to optimize application performance and enhance system reliability.
4. Develop and maintain incident response procedures and protocols to minimize system downtime.
5. Conduct regular audits and assessments to ensure compliance with industry standards and best practices.
6. Lead the implementation of automation tools and processes to streamline support operations and enhance efficiency.
7. Provide technical expertise and guidance to team members, promoting a culture of continuous learning and development.
Skill Requirements
1. Proficiency in site reliability engineering (sre) principles and practices.
2. Strong background in system administration, networking, and cloud computing.
3. Experience with monitoring tools such as prometheus, grafana, and elk stack.
4. Knowledge of containerization technologies like docker and kubernetes.
5. Ability to troubleshoot complex technical issues and perform root cause analysis.
6. Excellent communication skills and ability to work collaboratively in a team environment.
7. Strong project management and leadership skills to drive initiatives and deliver results efficiently.
8. Certifications in relevant areas such as aws certified devops engineer or google professional cloud devops engineer are a plus.