Overview
Remote
Depends on Experience
Full Time
Skills
Monitoring
Observability
Prometheus
Grafana
data visualization
Job Details
Senior Monitoring and Observability Engineer
Remote
Full Time Opportunity
Job Summary:
We are seeking a highly skilled Senior Monitoring and Observability Engineer to design, implement, and manage our monitoring and observability platform.
The ideal candidate will have extensive experience with Prometheus and Grafana and will be responsible for ensuring the health and performance of our systems through effective monitoring solutions.
Responsibilities:
- Design, implement, and manage our monitoring and observability platform primarily using Prometheus and Grafana.
- Create and manage a comprehensive suite of Grafana dashboards to provide real time visibility into the health and performance of our systems.
- Configure and manage alerting rules in Prometheus and Alert Manager to ensure timely notification of critical issues.
- Automate operational tasks to enhance efficiency and reduce manual intervention.
- Conduct performance analysis, capacity planning, and postmortem reviews to drive continuous improvement of our systems. This includes documenting findings and implementing corrective actions based on review outcomes.
Mandatory Skills:
- Proven experience with Prometheus for monitoring and alerting.
- Strong proficiency in Grafana for data visualization and dashboard creation.
- Experience in configuring alerting rules and managing alerts effectively.
- Solid understanding of performance analysis and capacity planning methodologies.
- Ability to conduct postmortem reviews and implement improvements based on findings.
Preferred Skills:
- Familiarity with container orchestration tools such as Kubernetes.
- Experience with scripting languages (e.g., Python, Bash) for automation tasks.
- Knowledge of cloud platforms (e.g., AWS, Azure) and their monitoring tools.
- Understanding of microservices architecture and its monitoring challenges.
- Experience with incident management and response processes.
Qualifications:
- Bachelor s degree in computer science, Information Technology, or a related field.
- 7 10 years of experience in monitoring and observability, with a focus on Prometheus and Grafana.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.
- Ability to work independently and as part of a team in a remote work environment.
Thanks & Regards,
Ron (Vineeth Damarla)
Lead Talent Acquisition Specialist Recruitment
American IT Systems
1116 S Walton Blvd, Suite 113 Bentonville, AR 72712
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.