Overview
Skills
Job Details
Title : Senior DevOps/SRE Engineer
Fulltime with CriticalRiver
About the Role
We’re looking for an experienced Senior DevOps / Site Reliability Engineer to design and build the cloud and reliability foundation for a new multi-tenant SaaS platform, while supporting our existing SaaS products. This is a foundational early hire with high impact—you’ll define AWS architecture, establish DevOps and SRE best practices, and ensure 99.9%+ uptime as we scale a multi-tenant platform.
You’ll work closely with Platform, Backend, Frontend, and AI teams to enable fast, secure deployments and production-grade reliability.
What You’ll Do
Architect and manage AWS infrastructure (EKS, RDS, VPC, IAM, S3)
Build and maintain Terraform-based Infrastructure as Code
Own Kubernetes/EKS clusters, scaling, upgrades, and deployments
Design and optimize CI/CD pipelines (GitHub Actions/Jenkins, GitOps)
Implement monitoring, alerting, and observability (Datadog, CloudWatch)
Lead incident response, on-call processes, and postmortems
Define and track SLOs/SLIs and error budgets
Implement security and compliance controls (SOC 2, IAM, encryption)
Required Qualifications
7–10+ years of DevOps / SRE experience in production environments
Deep expertise in AWS and Kubernetes (EKS)
Strong experience with Terraform or CloudFormation
Proven ownership of CI/CD, monitoring, and incident management
Experience supporting multi-tenant B2B SaaS platforms
Strong scripting skills (Python or Bash)
Security-first mindset with hands-on compliance exposure