The SRE Engineer will own reliability, observability, and operational hygiene for the combined telecom program (RRT + STIR/SHAKEN). Ensures platform stability across SBC telemetry and hybrid contact center touchpoints (Genesys + Amazon Connect migration), with disciplined incident response and change governance through disciplined SRE and DevOps practices.
Why this role exists
To ensure the combined program remains reliable, observable, and operationally disciplined particularly important in BFS environments where low-latency voice paths, SBC telemetry, and a hybrid contact center stack (Genesys + Amazon Connect migration) must meet stringent uptime, incident response, and change governance requirements.
Key Responsibilities
Define and monitor SLIs/SLOs for latency, availability, error rates, and signing success/failure signals
Build observability integrations (metrics/logs/alerts) aligned to enterprise monitoring standards and approved platforms
Drive proactive alerting, incident response, root-cause analysis, and post-incident reliability improvements
Implement operational automation, including monitoring/alerting workflows, runbook automation, and repeatable diagnostics
Enforce production readiness gates and change management discipline; support after-hours change windows as needed
Required Qualifications
7 10+ years in SRE / DevOps or infrastructure engineering
Strong experience with monitoring, alerting, and incident response and post-incident improvements
Familiarity with highly available, low-latency systems in regulated environments.
Experience operating in regulated BFS environments
Preferred Qualifications
SRE experience supporting telecom/voice platforms or real-time systems
Familiarity with hybrid on-prem + cloud observability patterns (Genesys + Amazon Connect coexistence)
Automation and resiliency testing experience; reliability engineering playbooks
Background in compliance-driven operational environments
Experience working in BFS Sector