Digital Site Reliability Engineer

Overview

Hybrid
$100,000 - $120,000
Full Time

Skills

Kubernetes
Continuous Integration
Docker
Dynatrace
GitLab
GraphQL

Job Details


POSITION GENERAL DUTIES AND TASKS :
Job Description:
We are seeking a highly skilled and experienced Reliability Engineer to join our team. The ideal candidate must have a strong background in technology, with specific expertise in Kubernetes, Gitlab, Dynatrace, GraphQL, Node, React with a good understanding of CI/CD pipelines. The candidate must be comfortable with ambiguity, learning new things and have a perseverance similar to if at first I don t succeed, try and try again

Responsibilities:
Collaborate with cross-functional teams to develop and maintain release architectures and monitor frameworks.
Provide system design consulting and critical support to the development team prior to program launch.
Identify and solve sophisticated performance and scaling issues, working with engineers to avoid bottlenecks and meet traffic demands.
Mentor and guide team members, helping them grow in their roles.
Identify and implement automation and monitoring tools to improve the efficiency and effectiveness of SRE processes.
Take ownership of any critical incidents and work towards timely resolution and prevention of future occurrences.
Mandatory Requirements:
Five (5) to Seven (7) years of professional experience in technology or a related field.
Two (2) years of experience with Kubernetes/EKS
Two (2) years of experience with CI/CD pipelines.
Two (2) years of experience with a sophisticated observability platform including RUM and APM.
Good To Have Requirements
Familiarity with reading and understanding JavaScript (Node.JS).
Capabilities utilizing Dynatrace APM and RUM (other APM or RUM may be applicable) - Dynatrace Associate Certification is a plus.
Intermediate to Advanced skills in BASH shell scripting, Python and Docker
Intermediate skills with on-prem Gitlab CI pipeline creation, troubleshooting, and configuration of Gitlab CI.
Preferred Qualifications:
Solve sophisticated performance and scaling issues, working with engineers to ensure that we avoid bottlenecks and meet traffic demands through organic growth and marketing events.
Strong problem-solving skills and the ability to work in a fast-paced environment.
Communicate effectively with stakeholders, including management, to provide updates, recommendations, and solutions for any SRE-related issues.
Excellent communication and collaboration skills.
Experience with Kubernetes/EKS and pod life cycle management including readiness and liveness checks.
Experience with building and supporting CI/CD pipelines and production releases.
Working knowledge of complex CDN cached website architecture.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.