Senior Observability Engineer (ESS Platform SME)

McLean, VA, US • Posted 8 hours ago • Updated 8 minutes ago
Full Time
Part Time
On-site
Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

  • API
  • HTTP
  • Analytics
  • CPU
  • Network
  • Performance Monitoring
  • Budget
  • Trend Analysis
  • Collaboration
  • Leadership
  • Elasticsearch
  • Kibana
  • Microservices
  • Kubernetes
  • OCP
  • Continuous Integration
  • Continuous Delivery
  • Scripting
  • Python
  • Shell
  • Groovy
  • Soft Skills
  • Conflict Resolution
  • Problem Solving
  • Communication
  • Dashboard
  • Software Performance Management
  • Employee Self-service

Summary

Job Title: Senior Observability Engineer (ESS Platform SME)

Location: McLean, VA(onsite) & inperson Interview

Job Type: C2C or W2



Role Overview:

We are seeking a highly experienced Senior Observability Engineer with deep expertise in ESS (Elastic Stack) to lead and accelerate the development of enterprise-grade observability capabilities across mission-critical applications.

This role requires a hands-on SME who can design, build, and scale observability dashboards, APM, tracing, and monitoring solutions exclusively within ESS. The candidate will play a key role in transforming current monitoring into a proactive, intelligent, and scalable observability ecosystem.

This is a high-impact, fast-paced engagement (target < 6 months) requiring ownership, technical depth, and execution excellence.



Key Responsibilities:

ESS Observability Architecture & Implementation

  • Design and implement end-to-end observability solutions using ESS (Elastic Stack).
  • Build a centralized observability layer covering all MF applications.
  • Ensure block-level aggregation with drill-down to:
    • Application-level metrics
    • APM traces
    • Logs and events
    • Service dependencies

Dashboard Engineering (Critical Priority)

  • Develop and scale a large backlog of ESS dashboards, including but not limited to:
    • Cluster Health (OCP/K8s)
    • API & APM Dashboards
    • Service Health & Dependency Monitoring
    • Pod Status / Restart / Scaling Metrics
    • HTTP Status Analytics (200/400/500 trends)
    • Transaction Processing Metrics
    • Infra Metrics (CPU, Memory, Disk, Network)
    • Synthetic Monitoring & Availability
  • Build intuitive, drill-down dashboards from MF Block Service Application level.

APM, Tracing & Monitoring Expansion

  • Expand ESS-based:
    • Application Performance Monitoring (APM)
    • Distributed tracing
    • Real User Monitoring (RUM)
    • Synthetic monitoring
  • Enable end-to-end traceability across microservices.

Proactive Observability & Alerting

  • Design and implement smart alerting rules:
    • Move from reactive proactive detection
    • Reduce noise, improve signal quality
  • Define SLOs, SLIs, and error budgets
  • Enhance anomaly detection and trend analysis

Collaboration & Leadership

  • Work closely with:
    • EOT Observability Team
    • Internal CDLs
    • Application teams
  • Act as ESS Observability SME
  • Provide guidance, standards, and best practices

Required Skills & Experience:

  • Strong hands-on experience with ESS (Elastic Stack):
    • Elasticsearch
    • Logstash
    • Kibana
    • Beats / Elastic Agent
    • Elastic APM
  • Proven experience building enterprise-scale observability dashboards in ESS
  • Deep understanding of:
    • Microservices architecture
    • Kubernetes / OpenShift (OCP)
  • Experience with:
    • APM, distributed tracing, logging, metrics correlation
  • Ability to design multi-layer observability (infra platform app)



Strongly Preferred:

  • Experience with:
    • Synthetic monitoring tools integrated with ESS
    • Real User Monitoring (RUM)
    • Service maps and dependency graphs
  • Knowledge of:
    • CI/CD observability integration
    • Alerting frameworks within Elastic
  • Scripting: Python / Shell / Groovy (nice to have)



Soft Skills:

  • Strong ownership mindset
  • Ability to work under aggressive timelines
  • Excellent problem-solving skills
  • Clear communication with technical and non-technical teams



Success Criteria (First 3 6 Months):

  • Deliver enterprise-grade ESS observability dashboards
  • Achieve full MF application visibility
  • Implement end-to-end APM + tracing coverage
  • Establish proactive alerting framework



Additional Notes:

  • Candidate must be an ESS expert - alternative tools experience alone will not be sufficient.
  • This is a high-priority, business-critical role with immediate impact expectations.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91112461
  • Position Id: OOJ - 3386-2387-1778088440
  • Posted 8 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Hybrid in McLean, Virginia

Today

Full-time

USD 86,800.00 - 198,000.00 per year

Hybrid in McLean, Virginia

Today

Full-time

USD 62,000.00 - 141,000.00 per year

Vienna, Virginia

6d ago

Easy Apply

Full-time

Depends on Experience

Alexandria, Virginia

Today

Full-time

USD 107,900.00 - 195,050.00 per year

Search all similar jobs