Senior Instrumentation & Observability Engineer

Overview

Remote
Full Time

Skills

Instrumentation
Microservices
MEAN Stack
Data Collection
Data Storage
Visualization
Predictive Analytics
Collaboration
Roadmaps
Evaluation
Innovation
HIPAA
Training
Legal
Conflict Resolution
Problem Solving
Communication
Strategic Thinking
Leadership
Management
FOCUS
Continuous Improvement
Mentorship
Open Source
Regulatory Compliance
Finance
Health Care
Statistics
Data Science
Machine Learning (ML)
System Monitoring
Kubernetes
Amazon Web Services
Google Cloud Platform
Google Cloud
Microsoft Azure
Data Engineering
Time Series
Database
Streaming
Data Processing
Continuous Integration
Continuous Delivery
Computer Science
Software Engineering
DevOps
Reliability Engineering
Grafana
New Relic
Splunk
Python
Java
Cloud Computing
System Integration Testing
Writing

Job Details

Position Summary:

The Senior Instrumentation and Observability Engineer leads the design, implementation, and maintenance of advanced observability strategies for complex distributed systems. This role is responsible for architecting sophisticated monitoring solutions, establishing observability standards, and driving the adoption of best practices across the organization. As a senior technical contributor, you will provide deep expertise in telemetry data collection, analysis, and visualization while mentoring team members and influencing observability practices throughout the engineering organization.

Essential Functions and Job Responsibilities:

Observability Architecture & Strategy
  • Architect comprehensive observability solutions for complex distributed systems and microservice architectures
  • Define the long-term technical vision and strategy for observability across the organization.
  • Establish best practices, standards, and patterns for instrumenting applications and infrastructure.
  • Lead cross-team initiatives to improve system observability and reduce mean time to detect/resolve issues.
  • Design scalable telemetry data collection, processing, and storage systems capable of handling high volumes.

Advanced Monitoring & Analysis
  • Design sophisticated monitoring systems with advanced alerting logic and minimal alert fatigue.
  • Develop comprehensive SLI/SLO frameworks aligned with business objectives.
  • Create advanced visualization systems providing actionable insights into system behavior and performance.
  • Implement anomaly detection and predictive analytics to identify potential issues before they impact users.
  • Lead post-incident reviews with data-driven analysis to prevent recurrence of issues.

Leadership & Mentorship
  • Serve as a technical leader and subject matter expert on observability across the organization.
  • Mentor junior engineers and provide guidance on observability practices and techniques.
  • Collaborate with engineering leadership to define and implement observability roadmaps.
  • Lead the evaluation and adoption of new observability technologies and approaches.
  • Partner with product and development teams to integrate observability considerations into the design phase.

Platform Innovation
  • Architect and develop custom observability solutions for unique business and technical requirements.
  • Lead the design and implementation of observability data pipelines with sophisticated processing capabilities.
  • Create advanced self-service tools that empower teams to manage their observability configurations.
  • Develop integrations between observability systems and other enterprise platforms.
  • Optimize observability infrastructure for cost-effectiveness and efficiency.
  • Maintain patient confidentiality and function within the guidelines of HIPAA.
  • Complete assigned compliance training and other educational programs as required.
  • Maintain compliance with AdaptHealth's Compliance Program.
  • Perform other related duties as assigned.
  • Assist in vendor contract reviews with managers and legal.

Competency, Skills, and Abilities:
  • Exceptional problem-solving abilities with systematic approach to complex issues
  • Outstanding communication skills with ability to explain technical concepts to varied audiences.
  • Strategic thinking with ability to balance immediate needs and long-term vision.
  • Demonstrated leadership with ability to influence without direct authority.
  • Proactive mindset with focus on continuous improvement
  • Ability to mentor others and share knowledge effectively.
  • Experience building and leading observability teams or functions.
  • Expertise in multiple cloud platforms and their native observability solutions
  • Contributions to open-source observability projects or published work on observability topics.
  • Experience implementing observability in high-compliance environments (finance, healthcare)
  • Background in statistical analysis, data science, or machine learning as applied to system monitoring.
  • Experience with AIOps and automated remediation systems
  • Observability Platforms: Expert-level knowledge of multiple platforms (Prometheus, Grafana, Datadog, New Relic, Elastic Stack, Splunk)
  • Distributed Tracing: Deep experience with OpenTelemetry, Jaeger, Zipkin and trace-based analysis
  • Programming: Advanced proficiency in Python, Go, Java, or similar languages
  • Infrastructure: Expert knowledge of Kubernetes, service mesh, and cloud platforms (AWS, Google Cloud Platform, Azure)
  • Data Engineering: Advanced experience with time-series databases, streaming data, and data processing pipelines
  • Automation: Expert-level CI/CD knowledge and Infrastructure as Code practices

Requirements

Education and Experience Requirements:
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience), Master's degree preferred.
  • 7+ years of experience in software engineering, DevOps, or site reliability engineering
  • 4+ years of specialized experience with observability platforms and practices
  • Expert-level knowledge of observability tools (Prometheus, Grafana, Datadog, New Relic, Elastic Stack, Splunk)
  • Advanced proficiency in at least one programming language (Go, Python, Java, etc.)
  • Extensive experience with distributed tracing systems and implementation patterns
  • Deep understanding of cloud-native technologies and containerized environments
  • Proven track record of leading technical initiatives and influencing engineering practices.

Physical Demands and Work Environment:
  • Must be able to bend, stoop, stretch, stand, and sit for extended periods.
  • Ability to perform repetitive motions of wrists, hands, and/or fingers due to extensive computer use.
  • The work environment may be stressful at times, as overall office activities and work levels fluctuate.
  • Subject to prolonged periods of sitting and exposure to computer screens.
  • Ability to utilize a personal computer and other office equipment.
  • Must be able to lift 30 pounds as needed.
  • Physical and mental ability to analyze, solve problems and lead others.
  • Excellent ability to communicate both verbally and in writing.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.