Senior Instrumentation & Observability Engineer

Overview

Remote

Full Time

Skills

Instrumentation

Microservices

MEAN Stack

Data Collection

Data Storage

Visualization

Predictive Analytics

Collaboration

Roadmaps

Evaluation

Innovation

HIPAA

Training

Legal

Conflict Resolution

Problem Solving

Communication

Strategic Thinking

Leadership

Management

FOCUS

Continuous Improvement

Mentorship

Open Source

Regulatory Compliance

Finance

Health Care

Statistics

Data Science

Machine Learning (ML)

System Monitoring

Kubernetes

Amazon Web Services

Google Cloud Platform

Google Cloud

Microsoft Azure

Data Engineering

Time Series

Database

Streaming

Data Processing

Continuous Integration

Continuous Delivery

Computer Science

Software Engineering

DevOps

Reliability Engineering

Grafana

New Relic

Splunk

Python

Java

Cloud Computing

System Integration Testing

Writing

Job Details

Position Summary:

The Senior Instrumentation and Observability Engineer leads the design, implementation, and maintenance of advanced observability strategies for complex distributed systems. This role is responsible for architecting sophisticated monitoring solutions, establishing observability standards, and driving the adoption of best practices across the organization. As a senior technical contributor, you will provide deep expertise in telemetry data collection, analysis, and visualization while mentoring team members and influencing observability practices throughout the engineering organization.

Essential Functions and Job Responsibilities:

Observability Architecture & Strategy

Architect comprehensive observability solutions for complex distributed systems and microservice architectures
Define the long-term technical vision and strategy for observability across the organization.
Establish best practices, standards, and patterns for instrumenting applications and infrastructure.
Lead cross-team initiatives to improve system observability and reduce mean time to detect/resolve issues.
Design scalable telemetry data collection, processing, and storage systems capable of handling high volumes.

Advanced Monitoring & Analysis

Design sophisticated monitoring systems with advanced alerting logic and minimal alert fatigue.
Develop comprehensive SLI/SLO frameworks aligned with business objectives.
Create advanced visualization systems providing actionable insights into system behavior and performance.
Implement anomaly detection and predictive analytics to identify potential issues before they impact users.
Lead post-incident reviews with data-driven analysis to prevent recurrence of issues.

Leadership & Mentorship

Serve as a technical leader and subject matter expert on observability across the organization.
Mentor junior engineers and provide guidance on observability practices and techniques.
Collaborate with engineering leadership to define and implement observability roadmaps.
Lead the evaluation and adoption of new observability technologies and approaches.
Partner with product and development teams to integrate observability considerations into the design phase.

Platform Innovation

Architect and develop custom observability solutions for unique business and technical requirements.
Lead the design and implementation of observability data pipelines with sophisticated processing capabilities.
Create advanced self-service tools that empower teams to manage their observability configurations.
Develop integrations between observability systems and other enterprise platforms.
Optimize observability infrastructure for cost-effectiveness and efficiency.
Maintain patient confidentiality and function within the guidelines of HIPAA.
Complete assigned compliance training and other educational programs as required.
Maintain compliance with AdaptHealth's Compliance Program.
Perform other related duties as assigned.
Assist in vendor contract reviews with managers and legal.

Competency, Skills, and Abilities:

Exceptional problem-solving abilities with systematic approach to complex issues
Outstanding communication skills with ability to explain technical concepts to varied audiences.
Strategic thinking with ability to balance immediate needs and long-term vision.
Demonstrated leadership with ability to influence without direct authority.
Proactive mindset with focus on continuous improvement
Ability to mentor others and share knowledge effectively.
Experience building and leading observability teams or functions.
Expertise in multiple cloud platforms and their native observability solutions
Contributions to open-source observability projects or published work on observability topics.
Experience implementing observability in high-compliance environments (finance, healthcare)
Background in statistical analysis, data science, or machine learning as applied to system monitoring.
Experience with AIOps and automated remediation systems
Observability Platforms: Expert-level knowledge of multiple platforms (Prometheus, Grafana, Datadog, New Relic, Elastic Stack, Splunk)
Distributed Tracing: Deep experience with OpenTelemetry, Jaeger, Zipkin and trace-based analysis
Programming: Advanced proficiency in Python, Go, Java, or similar languages
Infrastructure: Expert knowledge of Kubernetes, service mesh, and cloud platforms (AWS, Google Cloud Platform, Azure)
Data Engineering: Advanced experience with time-series databases, streaming data, and data processing pipelines
Automation: Expert-level CI/CD knowledge and Infrastructure as Code practices

Requirements

Education and Experience Requirements:

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience), Master's degree preferred.
7+ years of experience in software engineering, DevOps, or site reliability engineering
4+ years of specialized experience with observability platforms and practices
Expert-level knowledge of observability tools (Prometheus, Grafana, Datadog, New Relic, Elastic Stack, Splunk)
Advanced proficiency in at least one programming language (Go, Python, Java, etc.)
Extensive experience with distributed tracing systems and implementation patterns
Deep understanding of cloud-native technologies and containerized environments
Proven track record of leading technical initiatives and influencing engineering practices.

Physical Demands and Work Environment:

Must be able to bend, stoop, stretch, stand, and sit for extended periods.
Ability to perform repetitive motions of wrists, hands, and/or fingers due to extensive computer use.
The work environment may be stressful at times, as overall office activities and work levels fluctuate.
Subject to prolonged periods of sitting and exposure to computer screens.
Ability to utilize a personal computer and other office equipment.
Must be able to lift 30 pounds as needed.
Physical and mental ability to analyze, solve problems and lead others.
Excellent ability to communicate both verbally and in writing.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share