Principal Site Reliability Engineer - AI

New York, NY, US • Posted 30+ days ago • Updated 11 hours ago

Full Time

On-site

$200000 - $250000/yr

Motion Recruitment Partners, LLC

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

Real-time
Recruiting
Scalability
Product Development
Leadership
IaaS
Incident Management
Root Cause Analysis
Data Engineering
Continuous Integration
Continuous Delivery
Roadmaps
Mentorship
Capacity Management
Performance Testing
Hardening
Collaboration
Regulatory Compliance
HIPAA
System On A Chip
Privacy
Operational Excellence
DevOps
Kubernetes
Orchestration
Microservices
Amazon Web Services
Google Cloud
Google Cloud Platform
Microsoft Azure
Terraform
Scripting
Python
Bash
Reliability Engineering
Stacks Blockchain
Grafana
Health Care
SaaS
Machine Learning (ML)
Lifecycle Management
Communication
Artificial Intelligence
Cloud Computing
Job Boards
LinkedIn

Summary

About Our Client
Our client is an AI-driven health-tech start-up on a mission to transform patient care through intelligent, secure, and highly reliable clinical automation tools. Their platform powers real-time insights for clinicians, improving patient outcomes and enabling healthcare systems to operate with unprecedented efficiency. They are entering a high-growth phase and are seeking a Principal Site Reliability Engineer to help scale their infrastructure and ensure world-class reliability.
Role Overview
Our client is hiring a Principal Site Reliability Engineer to serve as the technical authority for the reliability, scalability, and performance of their cloud-native infrastructure. This individual will design and implement systems that support rapid product development while meeting the resilience requirements of clinical-grade AI applications. The role blends hands-on engineering with architectural leadership and cross-functional collaboration across product, ML, infrastructure, and security teams.
What You'll Do

Architect, build, and optimize scalable, secure, and highly available cloud infrastructure (AWS/Google Cloud Platform/Azure).
Lead incident response, root-cause analysis, and production reliability improvements across the platform.
Implement observability frameworks (metrics, tracing, logging) that provide deep visibility into system performance.
Partner with ML and data engineering teams to operationalize AI/ML pipelines, ensuring reliability from data ingestion through model deployment.
Develop automated CI/CD pipelines, infrastructure-as-code, and guardrails for safer, faster deployments.
Define SLOs/SLIs and establish long-term reliability roadmaps aligned with clinical-grade requirements.
Mentor SREs and software engineers, promoting DevOps and reliability best practices across engineering.
Lead capacity planning, performance testing, and system hardening initiatives.
Collaborate with security teams to ensure compliance with HIPAA, SOC 2, and relevant privacy and security standards.
Evaluate new technologies and drive adoption of tools that improve operational excellence.

What They're Looking For

8+ years in SRE, DevOps, Infrastructure Engineering, or related fields.
Deep expertise with Kubernetes, container orchestration, and microservices architecture.
Strong experience with cloud platforms (AWS/Google Cloud Platform/Azure) and infrastructure-as-code tools such as Terraform, Pulumi, or CloudFormation.
Advanced proficiency in automation/scripting languages such as Python, Go, or Bash.
Strong knowledge of distributed systems, reliability engineering patterns, and modern observability stacks (Prometheus, Grafana, OpenTelemetry, Datadog, etc.).
Experience supporting highly regulated or mission-critical environments (healthcare, fintech, SaaS).
Hands-on experience with ML infrastructure, model lifecycle management, or data pipelines is a plus.
Excellent communication skills and a proactive, ownership-oriented mindset.

Why Candidates Love This Role

Mission-driven work that directly influences patient care and health outcomes.
Ownership of foundational infrastructure in a rapidly scaling AI start-up.
Competitive compensation, equity, and benefits.
A modern, cloud-native tech stack with the ability to shape future architecture.
A collaborative and innovative engineering culture.

If you'd like, I can also create:

a shorter/condensed version
a more formal corporate version
a job-board-optimized version (LinkedIn, Indeed, etc.)
a version tailored to a specific tech stack

Just let me know!

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10105282
Position Id: 801149
Posted 30+ days ago

Company Info

About Motion Recruitment Partners, LLC

Motion Recruitment delivers IT Talent Solutions for Contract, Direct Hire, Managed Solutions and Statement of Work to all of North America from our 21 delivery centers. Our high-touch, specialized, team-based recruitment model’s success is proven through our exemplary track record in filling the most challenging IT positions for startup and enterprise clients alike. Our hyper-specialized tech focus results in a truly consultative approach for both our clients and candidates, within our recruiting areas of expertise: Software, Mobile, Data, Infrastructure, Cybersecurity, Product + UX and Functional.

Motion also delivers IT Consulting Solutions through the Motion Consulting Group (MCG) that create true digital transformation for IT projects in Agile Development & Coaching, DevOps & DevSecOps Solutions, and Managed Services for IT Operations.

We’re also the proud creators of Tech in Motion and the Timmy Awards, our North American community platform, events series and award program that connects over 250,000 tech enthusiasts to meet, learn, and innovate.

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.