Overview
Skills
Job Details
Required Skills & Experience
5+ years in an SRE, DevOps, or similar reliability engineering role.
Strong experience with observability platforms (Datadog, New Relic, PrometheGrafana, OpenTelemetry, etc.).
Hands-on experience with incident management tools (PagerDuty, Opsgenie, VictorOps).
Knowledge of cloud infrastructure (AWS preferred) and modern CI/CD pipelines.
Deep understanding of distributed systems reliability, scaling, and performance optimization.
Proven ability to design and implement SLIs/SLOs/Error Budgets.
Experience with automation and Infrastructure as Code (Terraform, Ansible, CloudFormation).
Nice to Have
Background in healthcare or regulated environments (HIPAA compliance experience).
Experience with chaos engineering and reliability testing tools (Gremlin, Litmus).
AWS certification (Solutions Architect, DevOps Engineer).