Overview
Skills
Job Details
Title: Site Reliability Engineer
Requirements
Strong experience with Monitoring and Alerting tools such as Prometheus, Grafana, New Relic
Experience in container orchestration solutions in AWS with ECS, Fargate
Docker container development experience
Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
Skilled in building and maintaining dashboards using tools like Grafana, Prometheus and Statsd to provide critical insights
Worked with Service Reliability Engineering team to design SLI and SLO for respective applications
Strong experience with AWS cloud infrastructure and container orchestration operating in a GitOps framework
A solid core foundation in infrastructure and systems engineering including Unix/Linux compute, networking, storage, and monitoring stacks.
Have experience using automation tools such as Terraform, Ansible
Excellent written and oral communication skills
Strong interpersonal skills, adaptable and able to learn quickly
Off-hour implementations are required
Ability to build positive working relationships with the business contacts, within our IT team, and other IT departments
Ability to identify tasks and help develop project plans for medium and large-scale projects Preferred
College degree in computer science or related technical field with 7+ years of systems design, programming, implementation, and integration experience
3+ years of experience within the Amazon Web Services platform
AWS, Kubernetes Certifications