Job Title: Site Reliability Engineer (SRE) III /Lead
Location: Arlington, TX (Hybrid 2 Days Onsite / 3 Days Remote)
Position: Contract to Hire Hybrid
Job Overview
We are seeking a Lead Site Reliability Engineer (SRE) to join a highly innovative technology organization focused on modernizing large-scale financial platforms. This role is ideal for candidates with a strong software development background who are interested in applying engineering excellence to reliability, automation, and release engineering.
This is not a traditional on-call role. Instead, the position emphasizes proactive engineering, automation, and strategic influence over how software is built, released, and operated at scale.
The Lead SRE will work closely with development, architecture, and platform teams to ensure systems are highly available, scalable, secure, and performant.
key Responsibilities
- Define, implement, and evolve Site Reliability Engineering and Release Engineering strategies
- Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Design and build automated CI/CD pipelines with integrated testing and security controls
- Drive adoption of AI-assisted development tools (e.g., GitHub Copilot)
- Contribute hands-on to software development, automation, and platform engineering
- Perform Root Cause Analysis (RCA) and lead problem management initiatives
- Collaborate across engineering, architecture, and operations teams to improve system reliability
- Champion best practices in coding, testing, observability, and automation
Technical Environment
- Cloud: Microsoft Azure
- Containers: AKS, Kubernetes, Docker
- CI/CD: Automated pipelines and release engineering practices
- Databases: SQL Server, Oracle, NoSQL
- Architecture: Microservices, cloud-native systems
- Monitoring & Observability: Enterprise monitoring and incident management tools
- Development Tools: AI-driven development and automation tooling
Required Qualifications
- Strong software development background with hands-on coding experience
- Proficiency in one or more of the following:
- Deep understanding of:
- Software design patterns
- Development architecture
- Microservices-based systems
- Experience with DevOps, Release Engineering, and pipeline automation
- Hands-on experience with Azure cloud infrastructure
- Strong troubleshooting skills across complex, distributed systems
- Experience working in Agile / Scrum environments