Director of AI SRE & DevOps, AI.x

  • San Francisco, CA
  • Posted 3 days ago | Updated moments ago

Overview

On Site
USD 280,000.00 - 420,000.00 per year
Full Time

Skills

Creative Problem Solving
Finance
Product Engineering
Generative Artificial Intelligence (AI)
Customer Experience
Scalability
FOCUS
Operational Excellence
Roadmaps
Collaboration
Business Strategy
Management
Service Level
Budget
Real-time
Artificial Intelligence
Provisioning
Apache Velocity
Software Engineering
DevOps
Startups
Scratch
Continuous Integration
Continuous Delivery
High Availability
Cloud Computing
People Management
Recruiting
Mentorship
Performance Management
Open Source
Communication
IT Management
Incident Management
Reliability Engineering
Terraform
Google Cloud Platform
Google Cloud
Computer Science

Job Details

Your Opportunity

At Schwab, you will build a rewarding career while making a difference in the lives of our millions of clients. Here, innovative thinking meets creative problem solving as we work together to challenge the status quo. We believe in the power of collaboration and value being together in the office, which is why this role is based on-site in our San Francisco office. Joining Schwab means joining a company committed to transforming the financial industry and putting clients at the center of everything we do.

Schwab's AI Strategy & Transformation team, known as AI.x, is the central hub for Artificial Intelligence at Schwab. We are an integrated product, engineering, strategy and risk team, all based in San Francisco. We help set the enterprise vision for AI, invest in the most promising opportunities, and accelerate delivery across the company. We also build the core platform that powers AI at scale and explore next-generation GenAI efforts that will redefine how we serve our clients. As a Senior Engineer on AI.x, you will play a key role in bringing these priorities to life by designing and delivering innovative AI solutions.

This role is an opportunity to join a high-profile team shaping Schwab's future with AI, to build solutions that matter to millions of clients, and to grow your career in one of the most exciting areas of technology today.

As the Director of AI SRE & DevOps you will lead infrastructure and reliability efforts for cutting-edge GenAI applications that enhance the client experience and create business value. You will work closely with architects, engineers, and business leaders to ensure scalability, reliability and security of solutions that build towards Schwab's enterprise strategy. You will focus on production availability but also lead the strategy for building reliable applications that use LLMs, early in the development process. You will ensure that the systems we build are robust, reliable, and well-monitored, implementing best practices for observability and operational excellence to maintain high performance and uptime for mission-critical AI applications. Above all, you will bring curiosity, creativity, and technical depth to help shape the next generation of AI at Schwab.

Responsibilities
  • Define and execute the strategic roadmap for reliability, observability and automation across AI platforms
  • Collaborate with AI Engineering, Product, and Security teams to ensure seamless integration of DevOps and SRE practices into the development lifecycle
  • Champion reliability, monitoring, observability, and operational best practices for AI systems and data pipelines.
  • Collaborate with cross-functional teams to align solutions with enterprise strategy and technical standards.
  • Lead and mentor a high-performing SRE team, fostering strong practices and continuous learning.
  • Implement and maintain monitoring, alerting, and incident response frameworks to ensure system health and reliability.
  • Establish and manage Service Level Objectives, error budgets and incident response runbooks.
  • Implement observability frameworks for real-time monitoring of AI services, including metrics, logs, and traces.
  • Champion automation across provisioning, deployment, and monitoring to reduce manual intervention and improve developer velocity
  • Coach development teams on CI/CD and Infrastructure as Code best practices to ensure that all systems are robust, scalable and reliable.
  • Own the process for releasing software to production, working with internal stakeholders to champion frequent and low-risk changes to maintain high availability and quality.

What you have

Required Qualifications
  • 10+ years of software engineering experience, with 5+ years as a hands-on DevOps or SRE leader in startups and/or large organizations.
  • Bachelor's degree in Computer Science or related field, or equivalent experience.
  • 7+ years building complex products from scratch, running them in production, and ensuring operational reliability.
  • 5+ years working with containers and cloud-native applications, operationalizing them in the public cloud with infrastructure as code and CI/CD pipelines.
  • 3+ years of experience leading SRE teams in high-availability hybrid-cloud environments.
  • Strong people management skills, including hiring, mentoring, performance management and career development.

Preferred Qualifications
  • Strong computer science fundamentals and experience across the tech stack.
  • Experience with proprietary or open-source LLMs (e.g., Gemini, Claude, OpenAI), deploying LLM-powered applications to production and maintaining availability.
  • Strong written and verbal communication skills to clearly convey ideas and feedback.
  • Technical leadership and supporting teams' technical growth through code reviews and guidance.
  • Strong understanding of observability, incident management and reliability engineering principles.
  • Mindset of continuous learning and improvement, adept at both giving and receiving feedback.
  • Ability to troubleshoot complex problems with ambiguous or incomplete data in distributed systems.
  • Curiosity about new technologies and processes, proactively sharing knowledge and seeking improvement.
  • Experience with Terraform and Google Cloud Platform preferred but not required.
  • Master's or advanced degree in Computer Science or related field.
  • In addition to the salary range, this role is eligible for bonus or incentive opportunities.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.