REMOTE - Director Platform Engineering and Reliability

Remote • Posted 6 hours ago • Updated 6 hours ago

Full Time

Occasional Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

site reliability
sre
governance
reliability
platform
devops
iac
rca
slo
kpi
dora
soc2
aws
docker

Summary

100% REMOTE - Director Platform Engineering and Reliability - Direct Hire Full Time

Reliability Governance, Incident Management & Root Cause Accountability

Establish a reliability operating model that makes risk visible, decisions repeatable and improvements durable. You will own the Root Cause Analysis (RCA) process for all production incidents — ensuring timely, thorough and blameless reviews that identify systemic contributors rather than surface-level causes and driving corrective actions to completion.

· Define and operationalize SLIs, SLOs and error budgets for critical services, and ensure they influence prioritization and release decisions.

· Normalize incident response routines (roles, severity definitions, escalation, communications) that reinforce trust during high-pressure events.

· Drive durable remediation (code, architecture, automation, process) and verify outcomes to reduce recurrence over time.

Observability & Signal Quality

Evolve observability so the platform produces actionable signal with minimal noise. The goal is earlier detection and clearer diagnosis, so intervention happens before customers experience impact.

· Strengthen logging, metrics and tracing standards to improve troubleshooting speed and confidence.

· Improve alert quality and reduce fatigue by tuning thresholds, routing, and ownership.

· Use observability improvements to measurably reduce MTTD (mean time to detection) and improve MTTR (mean time to recovery).

Engineering Effectiveness & Delivery Foundations

Improve delivery confidence and predictability by instrumenting effectiveness metrics and strengthening the delivery pipeline. We want teams shipping more frequently with lower risk.

· Instrument and operationalize DORA metrics (deployment frequency, change failure rate, lead time, MTTR) and use the data to target bottlenecks.

· Evolve CI/CD patterns, rollout safeguards and rollback strategies to increase deployment frequency while lowering change failure rate.

· Raise engineering confidence through stronger automation discipline (including test automation and release guardrails) as the system matures.

Platform Engineering & Cloud Architecture

Advance platform foundations so product teams can build safely and consistently with less cognitive load. This includes cloud architecture governance, Infrastructure as Code, and containerization/orchestration practices appropriate to system scale.

· Advance Infrastructure as Code standards (Terraform) and AWS architecture patterns that support scale, performance and cost visibility (with potential future Azure expansion).

· Strengthen containerization and orchestration practices using technologies such as Docker and Kubernetes where appropriate.

· Establish paved-road platform patterns that make the secure, reliable path the easiest path for product teams.

Our core stack includes AWS (future expansion into Azure), MS SQL Server on AWS RDS, Terraform, GitHub, Jira, Confluence, and development primarily in Visual Studio / Visual Studio Code environments.

The Profile We’re Looking For

You bring strong technical depth and pragmatic leadership. You are comfortable being hands-on early, and you can scale standards, systems and teams over time.

· 8–12+ years in software engineering, DevOps, SRE or platform roles within a B2B SaaS environment; demonstrated success improving reliability and delivery predictability in growth-stage companies.

· Deep familiarity with AWS-based production systems, Infrastructure as Code, and containerized environments (Docker, Kubernetes).

· Experience implementing SLOs/error budgets, operating incident management and RCA processes, and driving systemic reliability improvements.

· Experience using operational and delivery metrics (DORA and related KPIs) to guide prioritization and measurable improvement.

· Experience supporting SOC2, PCI-DSS or comparable compliance initiatives is valued.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: CONTEMP
Position Id: 8902102
Posted 6 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Platform Engineering Lead

Remote or California

•

Today

At Deckers Brands, Together, Every Step is a promise kept that every employee can bring their authentic self, is valued and supported, as a whole person, at work and beyond. Together, Every Step is how we continue to deliver exceptional business results, experience an amazing place to work, and have a positive impact on the communities and world around us. Job Title: Platform Engineering Lead Reports to: Sr. Manager, Data Platform Engineering and Ops Location: United States (Remote) Intereste

Full-time

USD 150,000.00 - 160,000.00 per year

Software Product / DevOps Engineer (Remote Opportunity)

Remote

•

26d ago

About Us At EZ Labs a mission-driven technology division of VetsEZ we are transforming healthcare delivery through technology, innovation, and compassion. We partner with care teams, payers, and providers to improve how patients experience care, how claims are processed, and how care coordination is managed across the continuum. By integrating advanced analytics, intelligent automation, and secure cloud platforms, we enable smarter decisions, seamless patient engagement, and more efficient cla

Full-time

100,000 - 120,000

IT Director, Infrastructure and Platform Engineering

Remote

•

Today

Working at Yale means contributing to a better tomorrow. Whether you are a current resident of our New Haven-based community- eligible for opportunities through the New Haven Hiring Initiative or a newcomer, interested in exploring all that Yale has to offer, your talents and contributions are welcome. Discover your opportunities at Yale! Salary Range $165,000.00 - $247,500.00 Overview 1. Unified Infrastructure Strategy: Define and lead the long-term operational vision for a hybrid environmen

Full-time

USD 165,000.00 - 247,500.00 per year

Senior Manager - Platform Engineering

Remote

•

Today

Senior Engineering Manager, Platform Engineering Location: Buffalo, NY or Wilmington, DE Department: Technology Overview We are seeking a Director of Platform Engineering to lead the strategy, development, and delivery of the platforms and capabilities that power modern software engineering across the enterprise. This leader will build and mature the internal capabilities that enables teams to ship software that is secure, reliable, scalable, and fully automated across cloud and on-premise en

Full-time

USD 167,600.00 - 279,400.00 per year

Search all similar jobs