Job title: Senior Site Reliability Engineer
Location: Austin, TX (Hybrid)
The primary work location(s) will be at 4601 W. Guadalupe Street, Austin, TX 78701.
Duration: 04+ Months
Job number: 529601671
Due date: 04/09/2026
Interview: In person only (No exception)
Position will be 3 days remote with 2 days (Mondays and Thursdays) required to be onsite at the location listed above. Program will only accept LOCAL ONLY candidates for this position
Job Discription: Site Reliability Engineer will be responsible for ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering practices to infrastructure and operations. Partners with development teams to build resilient, observable, and automated platforms that meet defined service level objectives (SLOs).
Required Skills:
8 Years of experience in systems engineering, DevOps, or site reliability engineering roles
8 Years of Strong experience with Linux/Unix systems and system internals
8 Years of Proficiency in one or more programming/scripting languages (Python, Go, Java, Bash)
8 Years of Experience designing and operating highly available, distributed systems
8 Years of Strong knowledge of cloud platforms (AWS, or Google Cloud Platform) and cloud-native services
8 Years of Experience with containerization and orchestration (Docker, Kubernetes)
8 Years of Strong understanding of monitoring, alerting, and logging concepts
8 Years of Experience defining and managing SLIs, SLOs, and error budgets
8 Years of Familiarity with incident management, root cause analysis (RCA), and postmortems
8 Years of Experience integrating security and compliance into operational workflows
Preferred Skills:
4 Years of Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog, Splunk)
4 Years of Experience operating 24x7 production environments with on-call rotations
4 Years of Experience with chaos engineering and resiliency testing
4 Years of Experience with feature flags, canary deployments, and progressive delivery
4 Years of Strong documentation skills for runbooks, dashboards, and operational standards