Apply Now

Senior Observability Engineer

Culver City, CA, US • Posted 10 hours ago • Updated 10 hours ago

Contract Corp To Corp

Contract W2

6 Months

No Travel Required

On-site

$40 - $45/hr

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

Observability
Lakehouse & Data Pipeline Integration

Summary

Scalable Systems is a USA based Big Data, Analytics and Digital Transformation company focused on vertical specific innovative solutions. By providing next generation technology solutions and services, we help organizations to identify risks & opportunities, achieve operational excellence and to gain an innovative edge.

Role: Senior Observability Engineer

Culver City, CA Onsite

W2 Contract Opportunity Only

Descriptions:

"Customer is seeking a seasoned Observability expert who doesn't just manage dashboards but actively lives and breathes telemetry architecture. In this role, Personnel will elevate customer observability maturity across infrastructure, applications, and business transactions.

Personnel will own, design, and optimize the following core domains:

1. Operations & Noise Reduction

• Alert-to-Incident Signal Optimization: Analyze and optimize our Alert-to-Incident noise ratio (targeting a baseline better than 10:1). Drive the evolution from chaotic alerting to high-fidelity, actionable incident creation.

• Dynamic Baselining & Anomaly Detection: Shift the paradigm away from rigid static thresholds. Implement dynamic baseline that intelligently accounts for time-of-day, day-of-week, and seasonal traffic patterns.

2. Guardrails, Standards, & Observability-as-Code

• Observability-as-Code (OaC): Drive the maturity of our telemetry infrastructure by ensuring all dashboards, alerts, SLOs, and monitor configurations are defined, versioned, and deployed as code.

• CI/CD Instrumentation Gates: Establish and enforce automated instrumentation compliance gates within our deployment pipelines to ensure code is observable before it hits production.

• Fleet Health Management: Centrally manage, version, and monitor the health of our Open Telemetry (OTel) collectors and agent fleets.

3. Advanced Diagnostics & Next-Gen Tech

• Automated Root Cause Analysis (RCA): Implement platform capabilities that automatically surface probable root cause the moment an incident fire.

• Change & Deployment Correlation: Ensure all deployments, configuration changes, feature flag toggles, and database migrations are automatically annotated on dashboards and correlated to active incident timelines.

• GenAI/LLM-Assisted Triage: Evaluate and adopt GenAI/LLM capabilities for advanced log pattern explanation and accelerated incident troubleshooting.

4. Telemetry Architecture & Data Strategy

• Cloud-Native & Third-Party Monitoring: Ensure deep telemetry integration across cloud-managed services (AWS/Azure/Google Cloud Platform, EKS/AKS, Lambda, RDS) and critical third-party SaaS dependencies (e.g., Guidewire, Salesforce, Earnix, Uniphore, payment gateways).

• Lakehouse & Data Pipeline Integration: Architect pipelines to export raw telemetry data to our data Lakehouse (S3/ADLS) to power advanced ML pipelines and predictive analytics.

• Predictive Capacity Analytics: Leverage the observability platform for capacity forecasting—predicting utilization trends for CPU, memory, queue depth, and storage before saturation occurs.

• Log Standardization: Drive org-wide standards for log structure and serialization to ensure seamless cross-platform parsing and querying.

5. Culture, SLOs, & Business Impact

• End-to-End Business Transaction Tracing: Map and trace complex, multi-service customer journeys (e.g., policy quote  bind  pay) to provide full-context business transaction visibility.

• SLO/SLA Governance: Define, implement, and track Service Level Objectives (SLOs) across all production services.

• Developer Empowerment & Self-Service: Democratize observability by fostering a proactive culture where developers instrument their own services during active development, backed by standardized, self-service health dashboards."

"Monitoring, logging, tracing design (metrics, logs, traces)

Dashboarding, alerting, and telemetry pipelines

Observability platform design & optimization

Root Cause Analysis (RCA), incident analysis

SLO / SLI / SLA definition and error budgets

Strong understanding of AWS / Azure / Google Cloud Platform environments [PennyMac - SRE | Word]

Expertise in:

Microservices architecture

Distributed systems & event-driven systems

High availability & scalability patterns

CI/CD pipelines (GitLab, Jenkins) [West - Req...quirements | Excel]

Infrastructure as Code (Terraform, CloudFormation) [PennyMac - SRE | Word]

Containerization (Docker, Kubernetes troubleshooting) [West - Req...quirements | Excel]

Release observability & rollback readiness

Advanced / Differentiator Skills

AIOps / AI-driven observability [RE: Senior...Insurance | Outlook]

Predictive alerting / anomaly detection

Observability cost optimization

Chaos engineering basics

API & integration observability"

Scalable Systems is an Equal Opportunity-Affirmative Action Employer - Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10121745
Position Id: 28221-13094-
Posted 10 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Senior Observability Engineer

Los Angeles, California

•

Today

Descriptions: "Customer is seeking a seasoned Observability expert who doesn't just manage dashboards but actively lives and breathes telemetry architecture. In this role, Personnel will elevate customer observability maturity across infrastructure, applications, and business transactions. Personnel will own, design, and optimize the following core domains: 1. Operations & Noise Reduction Alert-to-Incident Signal Optimization: Analyze and optimize our Alert-to-Incident noise ratio (targeting

Easy Apply

Full-time, Part-time, Third Party, Contract

Lead Architect

Irvine, California

•

Yesterday

"I can be myself at work." You are more than a job title. We want you to feel comfortable doing great work and bringing your best, authentic self to everything you do. We value your talents, traditions, and uniqueness-and we're committed to fostering a strong sense of belonging in a respectful workplace. We intentionally seek diverse perspectives, experiences, and backgrounds, investing in a culture designed to celebrate differences. We believe that belonging leads to better outcomes and a stron

Full-time

USD 201,683.00 - 322,693.00 per year

SRE/Observability Engineer

Remote

•

Today

Hi, We are looking SRE/Observability Engineer at Remote Role: SRE/Observability Engineer Location : Remote Mode: Fulltime/1099 Note: Candidate Must be Expertise in Splunk or Dynatrace Job Description : We are seeking an experienced Observability Engineer / SRE to design, implement, and manage enterprise monitoring and observability solutions across multi-cloud environments. The ideal candidate will have strong expertise in Splunk Cloud, Dynatrace, cloud platforms, automation, and incident ma

Easy Apply

Contract

Depends on Experience

Senior Observability Engineer

New Jersey

•

Today

THE POSITIONOur roster has an opening with your name on it FanDuel is looking for a Senior Observability Engineer to design, build, and mature the observability ecosystem that underpins our platform and services. You will deliver deep visibility into system behavior by combining system telemetry with user signals to provide a holistic view of performance, reliability, and user experience. You'll also explore how AI and machine learning can enhance observability, from intelligent alerting and an

Full-time

USD 149,000.00 - 186,000.00 per year

Search all similar jobs

More jobs at Scalable Systems in Culver City, CA