Overview
Skills
Job Details
Job Title: Site Observability Engineer
Duration: 1+ Year
Job Summary
We are seeking a Senior Observability Engineer to strengthen our cloud infrastructure monitoring and alerting capabilities. This role will focus on implementing and improving observability strategies using Prometheus, Grafana, Gardener Kubernetes, and Splunk. Experience with Dynatrace is a plus.
Key Responsibilities
-
Design, implement, and enhance monitoring solutions with Prometheus to ensure high availability and accurate alerting.
-
Develop and maintain observability strategies to improve cloud monitoring posture.
-
Collaborate with development teams to integrate observability into the CI/CD pipeline and application lifecycle.
-
Respond to and investigate incidents, perform root cause analysis, and implement preventive measures.
-
Stay current with best practices in site reliability and observability.
-
Partner with cross-functional teams to ensure system reliability, scalability, and performance.
Qualifications
-
Bachelor's degree in Computer Science, IT, or related field (or equivalent experience).
-
Proven experience with observability tools: Prometheus, Grafana, Splunk.
-
Hands-on experience with Kubernetes (preferably Gardener Kubernetes).
-
Familiarity with logging, monitoring, and APM tools; Dynatrace experience is a plus.
-
Strong understanding of cloud infrastructure, networking, and distributed systems.
-
Proficient in scripting and automation (Python, Terraform, Ansible, etc.).
-
Excellent analytical, troubleshooting, and communication skills.
-
Ability to work effectively both independently and in team environments.