Sr. SRE (Site Reliability Engineer)

Overview

On Site
155,000 - 180,000
Full Time
No Travel Required
Unable to Provide Sponsorship

Skills

Workflow
Engineering Design
Analytics
DNS
DevOps
SQL Azure
Network
Database Administration
FOCUS
Cosmos-Db
CHAOS
Adobe AIR
Agile
Testing
YAML
Microsoft Azure
Modeling

Job Details

Sr. SRE (Site Reliability Engineer)                                    Salary Range: $155K - $180K

ONEflight International is the fastest growing market leader developing and implementing technological solutions for non-commercial air travel through the proprietary online Book a Jet platform. With nearly 700 world-class aircraft charter operator partnerships and a network of 7,000 private jets worldwide.

The demand for Private Air Travel has surged in recent years with future predictions confirming the demand is here to stay. ONEflight has emerged as a Leader in this space through innovation, technology, and business solutions.

ONEflight is providing the opportunity to participate in a fast-paced, agile environment. We are seeking a highly skilled and experienced Sr. SRE with expertise in Microsoft Azure to join our team. As a Sr. SRE specializing in Azure, you will play a crucial role in designing, implementing, and maintaining our Azure-based infrastructure, as well as managing our continuous integration and delivery pipelines to ensure scalability, reliability, and security.

Qualifications

Core Requirements

  • 5+ years in SRE, DevOps, or Production Engineering, with a heavy focus on Azure-based distributed systems.
  • Proven ability to design, implement, and operate scalable, fault-tolerant architectures in collaboration with engineering stakeholders.
  • Strong experience with Azure services including (but not limited to):
    Azure App Service, Azure Functions, AKS, Azure SQL Database, Azure Storage.
  • Deep expertise with containerized workloads, especially Docker and Kubernetes (AKS).
  • Hands-on experience building and maintaining CI/CD systems, ideally using Azure DevOps Pipelines.
  • Strong understanding of SLIs, SLOs, error budgets, and reliability-driven engineering.
  • Experience supporting and modernizing legacy applications while improving reliability and operability.
  • Solid understanding of Azure networking, identity, and security services: Azure Virtual Network, Azure Active Directory, Azure Key Vault, etc.
  • Experience with Cloudflare for DNS, CDN, DDoS protection, WAF policies, and edge networking.
  • Experience with Microsoft Fabric / Lakehouse architectures for analytics and data workflows.
  • Familiarity with Redis for caching and stream processing, including tuning, scaling, and monitoring.
  • Database administration experience across PostgreSQL and Azure Cosmos DB including backup/restore, performance optimization, and resilience strategies.
  • Deep knowledge of Azure monitoring/observability tools:
    Azure Monitor, Log Analytics, Application Insights, distributed tracing fundamentals.
  • Strong knowledge of Azure governance and compliance frameworks and how they intersect with reliability and automation.
  • Excellent communication skills and the ability to work cross-functionally with engineering and product teams.
  • Highly self-directed, curious, and able to execute in a fast-moving startup environment.

Nice to Have

  • Certifications such as Azure DevOps Engineer Expert, Azure Solutions Architect, or similar.
  • Experience building Pipelines-as-Code using YAML in Azure DevOps.
  • Exposure to Azure AI services (Azure OpenAI, ML, Cognitive Services).
  • Hands-on experience with IaC best practices using Azure Repos, Bicep, Terraform, or Pulumi.
  • Experience building or extending Azure DevOps extensions.
  • Familiarity with chaos engineering tools, automated fault injection, or resilience testing frameworks.

Responsibilities

Reliability & Production Operations

  • Define, implement, and maintain SLIs/SLOs, error budgets, and performance dashboards across our critical services.
  • Lead and mature our incident response program, including on-call rotations, postmortems, and mitigation strategies.
  • Build and operate highly available, scalable Azure infrastructure, ensuring resilience across regions and failure domains.
  • Conduct production readiness reviews and partner with developers to design services that are reliable by default.

Automation & Infrastructure Engineering

  • Design and maintain CI/CD pipelines that emphasize safety, repeatability, and speed.
  • Automate infrastructure provisioning using IaC principles, enforcing GitOps-style workflows where possible.
  • Eliminate manual toil through automation, tooling, runbooks, and self-service platforms.

Observability, Monitoring & Performance

  • Implement and evolve a modern observability stack using Azure Monitor, Log Analytics, and Application Insights.
  • Lead efforts in performance tuning, distributed tracing, capacity modeling, and cost-awareness engineering.
  • Surface actionable insights and proactively identify reliability risks before they impact customers.

Security & Compliance

  • Implement and enforce security best practices across infrastructure and application layers using tools like Azure Security Center and Azure Policy.
  • Collaborate with teams to ensure reliability and security are treated as shared responsibilities.

Collaboration & Leadership

  • Partner closely with engineers to integrate SRE principles into the SDLC.
  • Provide mentorship, share best practices, and help shape our engineering culture with a focus on operational excellence.
  • Champion experimentation and innovation, including the adoption of new technologies, resilience patterns, and automation strategies.

 

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.