Overview
On Site
USD 136,900.00 - 270,000.00 per year
Full Time
Skills
Profit And Loss
Creative Problem Solving
Finance
Performance Engineering
Apache Velocity
Mentorship
Continuous Improvement
Systems Design
Operational Efficiency
Root Cause Analysis
Capacity Management
Performance Tuning
Cost Management
Version Control
Code Review
Collaboration
Leadership
Management
Software Engineering
Programming Languages
Python
Java
Cloud Computing
Amazon Web Services
Microsoft Azure
Google Cloud
Google Cloud Platform
Systems Architecture
Continuous Integration
Continuous Delivery
Configuration Management
Budget
Incident Management
Regulatory Compliance
Risk Management
Communication
Articulate
Reliability Engineering
Return On Investment
Investments
Artificial Intelligence
Machine Learning (ML)
CHAOS
Testing
Kubernetes
Terraform
Grafana
IT Operations
Job Details
Your Opportunity
We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).
At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.
We are seeking an experienced SRE Director to lead and scale our Site Reliability Engineering organization. This role requires a proven technology leader who can drive the adoption of advanced tools and methodologies, foster a culture of continuous improvement, and ensure our systems are resilient, secure, and scalable. You will be instrumental in guiding teams through complex AI Ops transformations while empowering them to embrace new technologies and build a high-performance engineering culture.
This is not a traditional operations role. We're looking for a leader who embraces the SRE philosophy: treating operations as a software engineering problem, eliminating toil through automation, and using data-driven approaches to balance reliability with velocity. You'll lead the transformation from reactive operations to proactive engineering, where reliability is designed in, not bolted on.
Key Responsibilities
What you have
Required Qualifications
Preferred Qualifications
We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).
At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.
We are seeking an experienced SRE Director to lead and scale our Site Reliability Engineering organization. This role requires a proven technology leader who can drive the adoption of advanced tools and methodologies, foster a culture of continuous improvement, and ensure our systems are resilient, secure, and scalable. You will be instrumental in guiding teams through complex AI Ops transformations while empowering them to embrace new technologies and build a high-performance engineering culture.
This is not a traditional operations role. We're looking for a leader who embraces the SRE philosophy: treating operations as a software engineering problem, eliminating toil through automation, and using data-driven approaches to balance reliability with velocity. You'll lead the transformation from reactive operations to proactive engineering, where reliability is designed in, not bolted on.
Key Responsibilities
- Lead, mentor, and scale a high-performing team of SRE engineers and managers.
- Define and execute the strategic vision for site reliability, availability, and performance across the organization.
- Drive the adoption of advanced SRE practices, automation frameworks, and AI-powered operational tools.
- Foster a culture of continuous improvement and blameless learning through postmortems-turning failures into opportunities for growth.
- Partner with Engineering, Product, and Security teams to align SRE initiatives with business objectives.
- Transform traditional operations mindset to SRE culture: shifting from reactive firefighting to proactive system design, from manual processes to software-driven automation.
- Ensure systems are resilient, secure, and scalable to meet current and future business demands.
- Lead transformation initiatives leveraging AI Ops and intelligent automation to enhance operational efficiency.
- Establish and maintain SLIs, SLOs, and error budgets to drive reliability commitments and enable data-driven discussions about acceptable risk.
- Lead automation initiatives to eliminate toil and scale operational efficiency-prioritizing code-driven solutions over manual processes.
- Drive incident management excellence including root cause analysis, postmortem culture, and continuous learning.
- Oversee capacity planning, performance optimization, and infrastructure cost management.
- Apply software engineering principles to operations: version control, code review, testing, and CI/CD for all infrastructure and tooling.
- Foster collaboration between development and operations teams through SRE principles-breaking down silos and embedding reliability into the development process.
What you have
Required Qualifications
- 10+ years of experience in software engineering, infrastructure, or site reliability roles.
- 5+ years of people leadership experience managing engineering teams and managers.
- Strong software engineering background with proficiency in programming languages (Python, Go, Java, etc.)-this is not an operations-only role.
- Deep expertise in cloud platforms (AWS, Azure, Google Cloud Platform) and distributed systems architecture.
- Strong background in automation, CI/CD, infrastructure as code, and configuration management.
- Proven track record of driving large-scale technical and operational transformations, including AI Ops adoption.
- Experience implementing SLO/SLI frameworks and error budget policies.
- Experience with observability tools, monitoring platforms, and incident management systems.
- Strong understanding of security best practices, compliance requirements, and risk management.
- Excellent communication skills with ability to influence stakeholders at all levels.
- Ability to articulate the business value of reliability engineering and the ROI of automation investments.
Preferred Qualifications
- Experience with AI/ML operations, AIOps platforms, and intelligent automation.
- Background in chaos engineering, game days, and resilience testing.
- Knowledge of modern SRE tools and practices (Kubernetes, Terraform, Data Dog, Grafana, etc.).
- Experience leading the cultural transformation from traditional IT operations to SRE.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.