Director of Site Reliability Engineering

Overview

On Site

USD 220,000.00 - 260,000.00 per year

Full Time

Skills

Intellectual Property

Supply Chain Management

Research

Real-time

Microsoft

IBM

Reliability Engineering

FOCUS

Reporting

Scalability

Finance

Health Care

Roadmaps

Mentorship

Operational Excellence

Cloud Computing

Failover

Disaster Recovery

Apache Velocity

Continuous Integration

Continuous Delivery

SAFE

Productivity

Design Patterns

Security Controls

Management

Auditing

Customer Facing

Capacity Management

Performance Engineering

Testing

Workflow

IaaS

Amazon Web Services

Microsoft Azure

Google Cloud

Google Cloud Platform

Kubernetes

Docker

Terraform

Grafana

New Relic

Python

Database

Message Queues

Caching

Budget

Incident Management

Conflict Resolution

Problem Solving

Decision-making

Cyber Security

Regulatory Compliance

FedRAMP

System On A Chip

ISO/IEC 27001:2005

Adobe AIR

CHAOS

Open Source

Leadership

Machine Learning (ML)

Innovation

Collaboration

Communication

Military

SAP BASIS

Authorization

Law

LOS

Recruiting

Legal

Artificial Intelligence

Privacy

Job Details

This Jobot Job is hosted by: Merwan Zattam
Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.
Salary: $220,000 - $260,000 per year

A bit about us:

We are a mission-driven organization dedicated to making AI adoption safe and secure for enterprises worldwide. As the leading provider of Security for AI, our platform protects agentic, generative, and predictive AI applications across the entire lifecycle-safeguarding intellectual property, ensuring compliance, and enabling organizations to innovate with confidence.

Our team was founded by cybersecurity and machine learning veterans who experienced a real adversarial AI attack firsthand. That moment led to the creation of a new category focused entirely on protecting machine learning systems from threats such as prompt injection, adversarial manipulation, model theft, and supply chain compromise.

Backed by strategic investors including Microsoft's Venture Fund (M12), Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures, we combine patented technology with industry-leading research to defend the world's most critical AI systems.

Recognized by Gartner as a "Cool Vendor for AI Security" and trusted by Fortune 500 organizations, government agencies, and enterprises across highly regulated industries, we are shaping the future of AI security in real time. With strong product-market fit and rapid growth, this is an opportunity to join a generational company at a true inflection point-where the mission is bold, the bar is high, and the room for impact and growth is unmatched.

Why join us?

Top Benefits of Working Here

Be part of a new, fast-growing category

Work at the forefront of AI security, an emerging space with massive demand and almost no competition.

High-impact mission

Your work protects mission-critical AI systems for Fortune 500 companies, government agencies, and regulated industries.

Cutting-edge engineering

Tackle challenges in AI/ML security, adversarial defense, model protection, and large-scale distributed systems.

Backed by top-tier investors

Strong funding and stability from groups like Microsoft's venture fund, IBM Ventures, and others.

Build from the ground up

Shape the SRE, platform, and reliability culture-this is not a legacy environment.

High autonomy & ownership

Influence roadmap, architecture, tooling, and direction. Your work is visible and meaningful.

Fully remote, U.S.-based

Flexibility, work-life balance, and a high-performance culture.

Competitive pay + real equity upside

Top-tier compensation with equity at a company in a hyper-growth phase.

Elite team & steep career growth

Collaborate with seasoned leaders in cybersecurity, ML, and enterprise infrastructure-and grow as the company grows.

Job Details

Director of Site Reliability Engineering

Remote - United States

We are seeking a Director of Site Reliability Engineering to lead the broader Platform Engineering organization with a strategic focus on building a world-class SRE function. Reporting to the VP of Engineering, you will be responsible for the reliability, scalability, and operational excellence of the mission-critical AI security platform used by enterprises and government organizations worldwide.

In this senior leadership role, you will define the SRE strategy, mentor and scale a high-performing team, and implement the systems, practices, and culture required to support rapid growth. You will work at the intersection of cutting-edge AI security technology and enterprise-grade infrastructure, ensuring the platform delivers the always-on performance our customers depend on.

Your work will directly strengthen the security posture of organizations protecting their most valuable AI assets-from financial institutions and healthcare providers to government and Fortune 500 enterprises.

What You'll Do
Build and Lead the SRE Function
Define and execute the SRE strategy and roadmap, positioning reliability as a core product feature
Build, mentor, and scale a high-performing SRE and Platform Engineering team
Establish SRE principles, culture, and best practices across engineering
Create clear career development paths and raise the bar for hiring and excellence
Drive Platform Reliability & Operational Excellence
Own reliability, availability, latency, and performance across multi-cloud, multi-region deployments (AWS, Azure, Google Cloud Platform)
Set and achieve SLOs/SLIs aligned with business objectives
Architect multi-region resiliency: automated failover, graceful degradation, and disaster recovery
Build robust observability: distributed tracing, metrics, logging, and actionable alerting
Lead incident management: on-call processes, incident command, blameless post-mortems, and systematic remediation
Enable Developer Velocity & Platform Excellence
Own CI/CD pipelines and deployment infrastructure for safe, fast, reliable delivery
Build internal developer platforms and tooling that reduce toil and improve productivity
Implement progressive delivery (canaries, feature flags, automated rollbacks)
Partner with engineering teams to embed reliability requirements and design patterns early in development

Security, Compliance & Enterprise Requirements
Ensure alignment with standards such as FedRAMP, SOC 2, ISO 27001, and other regulatory requirements
Build and support air-gapped and on-premises deployment capabilities
Implement infrastructure security controls, secrets management, and audit logging
Support customer-facing SLAs and maintain trust with enterprise and government clients

Scale & Optimize the Platform
Lead capacity planning and performance engineering for platform growth
Drive chaos engineering and resilience testing to validate system behavior under failure
Optimize cost while maintaining reliability and performance
Automate operational workflows to eliminate toil and improve efficiency

What You Bring
Leadership & Experience
8+ years in infrastructure, platform engineering, or SRE roles
4+ years in engineering leadership
Experience supporting mission-critical, always-on systems at enterprise scale
Strong people leadership and a track record of building high-performing teams

Technical Expertise
Deep knowledge of cloud infrastructure (AWS, Azure, Google Cloud Platform) and multi-region systems
Strong experience with Kubernetes, Docker, and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
Proven ability to build and operate large-scale distributed systems
Expertise in observability tooling (Prometheus, Grafana, Datadog, New Relic, ELK/EFK, distributed tracing)
Proficiency in Python, Go, or similar languages
Understanding of databases, data pipelines, message queues, and caching systems

Strategic & Operational Skills
Experience driving SRE strategy, SLOs/SLIs, error budgets, and incident management
Ability to partner across engineering, product, security, and customer success
Strong communication skills across technical and non-technical audiences
Pragmatic problem-solving and sound decision-making

Bonus Experience
Background in cybersecurity or AI/ML infrastructure
Familiarity with compliance frameworks (FedRAMP, SOC 2, ISO 27001, NIST)
Experience supporting air-gapped or on-premise deployments
Hands-on experience with chaos engineering and game day exercises
Open-source contributions or SRE community leadership

Why This Opportunity Stands Out
Impact: Define reliability strategy for a category-leading AI security platform
Growth: Build and scale the SRE function from the ground up in a fast-growing, well-funded environment
Mission: Work on technology that is shaping the future of secure AI adoption
Team: Join a world-class engineering organization with deep roots in security, ML, and distributed systems
Innovation: Solve novel problems at the intersection of AI, security, and infrastructure
Flexibility: Fully remote role with competitive compensation, equity, and benefits

Location & Work Environment
This is a fully remote position within the United States. We value flexibility, ownership, collaboration, and excellence. The team operates across time zones with a blend of async communication, regular syncs, and purposeful in-person gatherings.

Equal Opportunity

We are an equal opportunity employer and do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any legally protected status. We are committed to fostering an inclusive environment where all team members can thrive.
If you need accommodations during the application or interview process, please let us know.

Interested in hearing more? Easy Apply now by clicking the "Apply Now" button.

Jobot is an Equal Opportunity Employer. We provide an inclusive work environment that celebrates diversity and all qualified candidates receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, age (40 and over), disability, military status, genetic information or any other basis protected by applicable federal, state, or local laws. Jobot also prohibits harassment of applicants or employees based on any of these protected categories. It is Jobot's policy to comply with all applicable federal, state and local laws respecting consideration of unemployment status in making hiring decisions.

Sometimes Jobot is required to perform background checks with your authorization. Jobot will consider qualified candidates with criminal histories in a manner consistent with any applicable federal, state, or local law regarding criminal backgrounds, including but not limited to the Los Angeles Fair Chance Initiative for Hiring and the San Francisco Fair Chance Ordinance.

Information collected and processed as part of your Jobot candidate profile, and any job applications, resumes, or other information you choose to submit is subject to Jobot's Privacy Policy, as well as the Jobot California Worker Privacy Notice and Jobot Notice Regarding Automated Employment Decision Tools which are available at jobot.com/legal.

By applying for this job, you agree to receive calls, AI-generated calls, text messages, or emails from Jobot, and/or its agents and contracted partners. Frequency varies for text messages. Message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You can reply STOP to cancel and HELP for help. You can access our privacy policy here: jobot.com/privacy-policy

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share