Apply Now

Senior Observability Engineer

Woodland Hills, CA, US • Posted 1 day ago • Updated 57 minutes ago

Full Time

Part Time

On-site

Fitment

Dice Job Match Score™

⭐ Evaluating experience...

Job Details

Skills

Instrumentation
Regulatory Compliance
Health Care Administration
Management
ROOT
Database
Migration
Generative Artificial Intelligence (AI)
Cloud Computing
Managed Services
Remote Desktop Services
Amazon RDS
SaaS
Guidewire
Salesforce.com
Payment Gateways
Extract
Transform
Load
Amazon S3
Machine Learning (ML)
Predictive Analytics
Analytics
Forecasting
CPU
Storage
Quoting
Service Level
Dashboard
Root Cause Analysis
Incident Management
SLA
Budget
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Google Cloud
Microservices
High Availability
Scalability
Continuous Integration
Continuous Delivery
GitLab
Jenkins
Terraform
Docker
Kubernetes
Optimization
CHAOS
API

Summary

Descriptions:

"Customer is seeking a seasoned Observability expert who doesn't just manage dashboards but actively lives and breathes telemetry architecture. In this role, Personnel will elevate customer observability maturity across infrastructure, applications, and business transactions.

Personnel will own, design, and optimize the following core domains:

1. Operations & Noise Reduction

Alert-to-Incident Signal Optimization: Analyze and optimize our Alert-to-Incident noise ratio (targeting a baseline better than 10:1). Drive the evolution from chaotic alerting to high-fidelity, actionable incident creation.

Dynamic Baselining & Anomaly Detection: Shift the paradigm away from rigid static thresholds. Implement dynamic baseline that intelligently accounts for time-of-day, day-of-week, and seasonal traffic patterns.

2. Guardrails, Standards, & Observability-as-Code

Observability-as-Code (OaC): Drive the maturity of our telemetry infrastructure by ensuring all dashboards, alerts, SLOs, and monitor configurations are defined, versioned, and deployed as code.

CI/CD Instrumentation Gates: Establish and enforce automated instrumentation compliance gates within our deployment pipelines to ensure code is observable before it hits production.

Fleet Health Management: Centrally manage, version, and monitor the health of our Open Telemetry (OTel) collectors and agent fleets.

3. Advanced Diagnostics & Next-Gen Tech

Automated Root Cause Analysis (RCA): Implement platform capabilities that automatically surface probable root cause the moment an incident fire.

Change & Deployment Correlation: Ensure all deployments, configuration changes, feature flag toggles, and database migrations are automatically annotated on dashboards and correlated to active incident timelines.

GenAI/LLM-Assisted Triage: Evaluate and adopt GenAI/LLM capabilities for advanced log pattern explanation and accelerated incident troubleshooting.

4. Telemetry Architecture & Data Strategy

Cloud-Native & Third-Party Monitoring: Ensure deep telemetry integration across cloud-managed services (AWS/Azure/Google Cloud Platform, EKS/AKS, Lambda, RDS) and critical third-party SaaS dependencies (e.g., Guidewire, Salesforce, Earnix, Uniphore, payment gateways).

Lakehouse & Data Pipeline Integration: Architect pipelines to export raw telemetry data to our data Lakehouse (S3/ADLS) to power advanced ML pipelines and predictive analytics.

Predictive Capacity Analytics: Leverage the observability platform for capacity forecasting predicting utilization trends for CPU, memory, queue depth, and storage before saturation occurs.

Log Standardization: Drive org-wide standards for log structure and serialization to ensure seamless cross-platform parsing and querying.

5. Culture, SLOs, & Business Impact

End-to-End Business Transaction Tracing: Map and trace complex, multi-service customer journeys (e.g., policy quote bind pay) to provide full-context business transaction visibility.

SLO/SLA Governance: Define, implement, and track Service Level Objectives (SLOs) across all production services.

Developer Empowerment & Self-Service: Democratize observability by fostering a proactive culture where developers instrument their own services during active development, backed by standardized, self-service health dashboards."

"Monitoring, logging, tracing design (metrics, logs, traces)

Dashboarding, alerting, and telemetry pipelines

Observability platform design & optimization

Root Cause Analysis (RCA), incident analysis

SLO / SLI / SLA definition and error budgets

Strong understanding of AWS / Azure / Google Cloud Platform environments [PennyMac - SRE | Word]

Expertise in:

Microservices architecture

Distributed systems & event-driven systems

High availability & scalability patterns

CI/CD pipelines (GitLab, Jenkins) [West - Req...quirements | Excel]

Infrastructure as Code (Terraform, CloudFormation) [PennyMac - SRE | Word]

Containerization (Docker, Kubernetes troubleshooting) [West - Req...quirements | Excel]

Release observability & rollback readiness

Advanced / Differentiator Skills

AIOps / AI-driven observability [RE: Senior...Insurance | Outlook]

Predictive alerting / anomaly detection

Observability cost optimization

Chaos engineering basics

API & integration observability"

Skills: AI Agents

Experience Required: 10 & Above

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91018020
Position Id: PDT - 11416-12551-1782424649
Posted 1 day ago

Company Info

About Purple Drive Technologies LLC

Founded in 2007, Purple Drive started as a tech solutions firm and has grown into a full-service consulting and talent partner. We help businesses navigate complex technology challenges while connecting top professionals with career-defining opportunities.

We believe in transforming businesses through smart IT solutions and empowering technologists to grow their expertise through challenging projects and meaningful partnerships. Built on over 20 years of trusted relationships, we create success stories for both our clients and the talented professionals who drive innovation forward.

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs

More jobs at Purple Drive Technologies LLC in Woodland Hills, CA