Overview
Skills
Job Details
Required Skills & Experience
12+ years overall IT experience with 5+ years in Observability / SRE / APM engineering.
Strong hands-on experience with at least one leading observability platform:
Datadog, Splunk, Elastic (ELK), Dynatrace, Grafana, Prometheus, New Relic, AppDynamics, etc.
Expertise in:
APM instrumentation, distributed tracing (OpenTelemetry), log aggregation, metrics pipelines.
Kubernetes, Docker, CI/CD, and cloud-native monitoring.
Python, Shell, PowerShell, or Go for automation and custom instrumentation.
Strong understanding of microservices, APIs, networking, performance engineering, and cloud architecture.
Experience designing SLO/SLI frameworks and implementing error budgets.
Exceptional communication and stakeholder management skills.
Preferred Qualifications
Certifications in AWS/Azure/Google Cloud Platform, Kubernetes, or SRE.
Experience with AIOps platforms for event correlation and anomaly detection.
Experience deploying OpenTelemetry collectors and custom exporting pipelines.