Overview
Skills
Job Details
Senior Observability Engineer (FOCUS Framework)
Location:Irvine, CA (Day 1 Onsite Local candidates only; must attend in-person interview)
Employment Type:C2C
Role OverviewWe are seeking a highly skilled Senior Observability Engineer to design, implement, and enhance real-time monitoring and observability solutions for mission-critical, customer-facing systems. This role will focus on implementing OpenTelemetry-based observability, developing SDKs for consistent instrumentation, building AWS-native data pipelines, and delivering actionable dashboards. The position is part of a high-visibility initiative to ensure proactive alerting, operational transparency, and system health monitoring across backend services.
Key Responsibilities-
Instrument backend services using OpenTelemetry SDKs for logs, traces, and metrics.
-
Design, build, and extend observability SDKs/libraries to enable consistent instrumentation.
-
Integrate observability pipelines with the FOCUS framework.
-
Configure and manage AWS services including OpenSearch, CloudWatch, Kinesis Data Streams/KDA, Lambda, and QuickSight.
-
Build and deploy QuickSight dashboards for real-time health monitoring.
-
Implement near real-time alerting and automated escalation mechanisms.
-
Extend observability coverage to additional services (e.g., Catalog-air).
-
Define performance baselines and implement anomaly detection rules.
-
Collaborate with backend and DevOps teams to ensure scalable and secure observability pipelines.
-
Develop documentation including runbooks, architecture diagrams, and onboarding guides.
-
5+ years of backend or DevOps engineering experience focused on observability/monitoring.
-
Strong expertise with OpenTelemetry SDKs, APIs, instrumentation, and integrations.
-
Proficiency with AWS services: OpenSearch, CloudWatch, Kinesis Data Streams & Analytics, Lambda, QuickSight.
-
Experience building log/metric processing and visualization pipelines.
-
Proven ability to design and maintain SDKs/libraries for telemetry collection.
-
Strong scripting skills in Node.js, Python, or TypeScript.
-
Experience monitoring distributed systems and microservices architectures.
-
Solid understanding of security and privacy best practices in observability.
-
Exposure to GraphQL APIs and AWS AppSync instrumentation.
-
Familiarity with AI/LLM-assisted logging enrichment.
-
Experience with runtime application monitoring (browser or In-Flight Entertainment environment).
-
Knowledge of performance benchmarking, SLIs/SLOs, and reliability metrics.