Sr. Specialist - Site Reliability Engineer

Southlake, TX, US • Posted 22 hours ago • Updated 9 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

  • Creative Problem Solving
  • Management
  • Financial Planning
  • Finance
  • Operational Excellence
  • Root Cause Analysis
  • Recovery
  • Dashboard
  • Reporting
  • Continuous Improvement
  • Documentation
  • Embedded Systems
  • Real-time
  • High Availability
  • Budget
  • Incident Management
  • Splunk
  • Grafana
  • Log Analysis
  • Scripting
  • Process Improvement
  • Collaboration
  • Communication
  • Product Engineering
  • Computer Science
  • Information Systems
  • Production Support
  • Reliability Engineering
  • Systems Engineering

Summary

Your Opportunity

At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.

Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning.

Workplace Services Engineering (WSE) is an organization within Schwab Technology Services that is embarking on a major transformation. We support Workplace Services, and we're shaping the future of how people experience financial wellbeing at work. We partner with leading employers to deliver innovative retirement, equity, and workplace financial solutions that help millions of participants build stronger financial futures. The Production Support SRE Engineer is responsible for ensuring the reliability, stability, and operational excellence of Workplace Services applications and platforms. This role blends hands-on production support with proactive reliability engineering-partnering with product, delivery, SRE, and infrastructure teams to reduce toil, improve observability, and strengthen service health. Each engineer will own one or more service areas, acting as the primary SRE contact and driving operational readiness, incident response, and continuous improvement initiatives. Success includes measurable improvements in availability, alert quality, automation adoption, and adherence to Schwab's SRE standards.

What you have

Production Support & Incident Management

Serve as the primary production support engineer for assigned Workplace Services applications, ensuring high availability, rapid incident response, and effective participation in both market-hour and after-hours on-call rotations. Lead root-cause analysis, support SLO breach investigations, and partner with product and delivery teams to restore and maintain service health.

Reliability Engineering & SRE Standards

Champion Schwab's SRE principles by improving observability, structured ELI logging, meaningful alerting, automation, and standardized dashboard/reporting patterns. Ensure new features, releases, and operational changes meet reliability, monitoring, and readiness expectations.

Operational Readiness & Continuous Improvement

Develop and maintain runbooks, operational guides, incident playbooks, and service documentation. Identify sources of operational toil, drive automation efforts, rationalize alerts, and deliver data-driven insights and trends to product and engineering teams for proactive reliability improvements.

Collaboration, Enablement & Culture

Act as the embedded SRE partner for your service area-attending key ceremonies, advising teams on operational risks, and promoting best practices in reliability engineering. Foster a culture of blameless postmortems, continuous learning, and cross-team enablement.

Required Qualifications

Technical & Operational
  • 2+ yrs experience in production support, incident management, and real-time troubleshooting for high-availability systems.

-Solid understanding of SRE principles, including SLIs, SLOs, error budgets, and incident response frameworks.

-Hands-on experience with observability and monitoring tools such as Splunk, Grafana, Moogsoft, or xMatters.

-Proficiency with structured logging, log analysis, and alert tuning.

-Ability to create and maintain runbooks, operational guides, and incident playbooks.

-Familiarity with automation concepts and ability to identify and reduce operational toil through scripts, tooling, or process improvements.

Collaboration & Communication

-Strong communication skills with the ability to translate complex technical issues into clear, business-friendly language.

-Ability to partner with product, engineering, and delivery teams to embed reliability into the development lifecycle.

-Experience participating in on-call rotations, including market-hours support and after-hours escalations.

Education & Experience

-Bachelor's degree in Computer Science, Engineering, Information Systems, or equivalent practical experience.

-Prior background in production support, site reliability engineering, systems engineering, or operations.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90989465
  • Position Id: 91b68ff913c17d8ee085d364b5a34f7e
  • Posted 22 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Southlake, Texas

Today

Full-time

USD 115,000.00 - 139,000.00 per year

Irving, Texas

Today

Full-time

USD 88,000.00 per year

Hybrid in Coppell, Texas

Today

Full-time

Hybrid in Fort Worth, Texas

Today

Easy Apply

Full-time

Depends on Experience

Search all similar jobs