Hello,
Hope you are doing well!!
Job Title: Senior Observability Engineer (ESS Platform SME)
Location: Remote
Key Responsibilities:
ESS Observability Architecture & Implementation
Design and implement end-to-end observability solutions using ESS (Elastic Stack).
Build a centralized observability layer covering all MF applications.
Ensure block-level aggregation with drill-down to:
Application-level metrics
APM traces
Logs and events
Service dependencies
Dashboard Engineering (Critical Priority)
Develop and scale a large backlog of ESS dashboards, including but not limited to:
Cluster Health (OCP/K8s)
API & APM Dashboards
Service Health & Dependency Monitoring
Pod Status / Restart / Scaling Metrics
HTTP Status Analytics (200/400/500 trends)
Transaction Processing Metrics
Infra Metrics (CPU, Memory, Disk, Network)
Synthetic Monitoring & Availability
Build intuitive, drill-down dashboards from MF Block → Service → Application level.
APM, Tracing & Monitoring Expansion
Expand ESS-based:
Application Performance Monitoring (APM)
Distributed tracing
Real User Monitoring (RUM)
Synthetic monitoring
Enable end-to-end traceability across microservices.
Proactive Observability & Alerting
Design and implement smart alerting rules:
Move from reactive → proactive detection
Reduce noise, improve signal quality
Define SLOs, SLIs, and error budgets
Enhance anomaly detection and trend analysis
Collaboration & Leadership
Work closely with:
EOT Observability Team
Internal CDLs
Application teams
Act as ESS Observability SME
Provide guidance, standards, and best practices
Required Skills & Experience:
Strong hands-on experience with ESS (Elastic Stack):
Elasticsearch
Logstash
Kibana
Beats / Elastic Agent
Elastic APM
Proven experience building enterprise-scale observability dashboards in ESS
Deep understanding of:
Microservices architecture
Kubernetes / OpenShift (OCP)
Experience with:
APM, distributed tracing, logging, metrics correlation
Ability to design multi-layer observability (infra → platform → app)
Strongly Preferred:
Experience with:
Synthetic monitoring tools integrated with ESS
Real User Monitoring (RUM)
Service maps and dependency graphs
Knowledge of:
CI/CD observability integration
Alerting frameworks within Elastic
Scripting: Python / Shell / Groovy (nice to have)
Soft Skills:
Strong ownership mindset
Ability to work under aggressive timelines
Excellent problem-solving skills
Clear communication with technical and non-technical teams
Success Criteria (First 3–6 Months):
Deliver enterprise-grade ESS observability dashboards
Achieve full MF application visibility
Implement end-to-end APM + tracing coverage
Establish proactive alerting framework
Additional Notes:
Candidate must be an ESS expert — alternative tools experience alone will not be sufficient.
This is a high-priority, business-critical role with immediate impact expectations.