Principle SRE Engineer

Remote • Posted 12 hours ago • Updated 12 hours ago
Contract W2
Contract Independent
Contract Corp To Corp
No Travel Required
Remote
Depends on Experience
Company Branding Image
Fitment

Dice Job Match Score™

⏳ Almost there, hang tight...

Job Details

Skills

  • Site Reliability Engineer
  • MTTR
  • Telemetry
  • SLO & SLIs

Summary

Role : Principle SRE Engineer

Duration : 6+ Months

Location : Dallas, TX(Remote)

 

Looking for senior/principle-level SRE practitioner who has strong hands-on experience implementing reliability practices at scale.

The type of profile that would be most valuable for us is someone who has personally driven the operationalization of SRE frameworks – not just at a strategic level, but through execution. This would include areas such as:

 

•         Defining and implementing SLIs/SLOs and reliability targets that align with the departments Golden Pathways

•         Building and operationalizing observability standards (metrics, logs, traces)

•         Designing/evolving existing incident management and RCA practices

•         Driving automation and reliability engineering workflows

•         Establishing service health dashboards and telemetry pipelines

•         Working closely with engineering teams to embed reliability into development and operations

 

  • Ideally this would be someone who has stood up or significantly evolved SRE programs in complex enterprise environments and can help accelerate implementation of the practices we are defining.
  • This role would be very execution-focused – someone comfortable rolling up their sleeves, working with the engineering teams directly, and helping us operationalize the reliability model across our platforms.

 

2.       Design and Build Central SRE Operating view

a.        Implement golden-pathway telemetry across:

i.      App Performance Monitoring (APM) – Service response times, transaction bottlenecks

ii.      Logging & Tracing -correlated logs, structured tracing

iii.      Event & Alerting – actionable event definitions tied to severity

iv.      RCA/Tagging Compliance monitoring – auto tagging, and RCA lifecycle ingestion

v.      Build executive level Scorecards and dashboards via Grafana and ServiceNow

 

Performance analytics:

1.       Per-app reliability score

2.       SRE maturity score

3.       Mean time to detect/respond/restore (MTTx)

4.       Escalation patterns and failure root trends

3.       Enable Long-Term SRE Governance

a.        Establish SRE telemetry ingestion pipelines

b.       Design alert logic for low-quality signals

c.        Build RCA tagging enforcement playbooks

d.       Deliver runbooks and telemetry integration guides per application type

4.       Centralized SRE Golden Dashboard – Single Pane of Glass

a.        A central pillar of this initiative is the creation of a Centralized SRE Golden Dashboard serving as a Single Pane of Glass – for executive and operational visibility across all 40 + applications

 

The dashboard will:

 

1.       Aggregate key telemetry: reliability metrics, RCA themes, MTTR, incident volumes, tag compliance, alert noise, performance degradation, and resilience scoring.

2.       Display per-app SRE health scores based on the maturity framework.

3.       Include dynamic drilldowns into:

a.        Incident hygiene (tagging, closure quality, RCA ownership)

b.       SLA/OLAs/SLIs/SLOd/Error budgets cleanly architected

c.        Alerting trends and noise correlation

d.       Capacity/resiliency warnings

e.        Serve as the definitive executive reporting source – used for monthly reviews, CIO/VP visibility, and roadmap investment decisions.

 

Key Skills: Site Reliability Engineer, Telemetry, RCA, SLA & SLIs, MTTR, Incident management

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10330808
  • Position Id: 95783-5195-
  • Posted 12 hours ago

Company Info

About VDart, Inc.

VDart, headquartered in Atlanta, GA, is a global leader in digital talent solutions and IT staffing, delivering top technology professionals to businesses worldwide. With a strong presence across North America, Europe and Asia, we specialize in helping organizations navigate complex technology landscapes with the right expertise.

Through a strategic, client-focused approach, we have placed over 20,000 professionals across key industries and advanced technology solutions. Whether placing top talent in cutting-edge roles or providing strategic digital workforce solutions, our network of 4,000 specialists across 13 countries is committed to excellence, agility and impact.

Backed by 18 years of industry experience, we go beyond staffing to build long-term partnerships that accelerate digital transformation and drive sustained growth. Whether you need a technology partner to fuel innovation or specialized workforce solutions to maintain a competitive edge, VDart delivers the right people, skills and mindset to create a lasting impact in a digital-first world.

Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs