Apply Now

Manager, Software Development & Engineering

Southlake, TX, US • Posted 4 hours ago • Updated 4 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

Creative Problem Solving
Finance
Financial Planning
Risk Management
Stakeholder Communications
Dashboard
Batch File
Documentation
Version Control
Collaboration
Mentorship
Reliability Engineering
Production Support
Root Cause Analysis
Operational Risk
SLA
Splunk
Computer Networking
Database
Scripting
Python
Shell
Bash
Windows PowerShell
GitHub
Software Configuration
Software Release Life Cycle
Continuous Integration
Continuous Delivery
Failover
RPO
Testing
SQL
Data Validation
JIRA
Scrum
BMC Remedy
Authorization
Communication
Docker
Linux
Microsoft Windows
High Availability
AppDynamics
Grafana
ExtraHop
Performance Monitoring
Enterprise Storage
Access Control
Encryption
Storage
Disaster Recovery
Recovery
Oracle
Workflow
Offshoring
Financial Services
Software Maintenance
IT Service Management
Incident Management
Software Design
Agile
Software Development
Management
Regulatory Compliance
Strategic Thinking

Summary

Your Opportunity

At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.

Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning.

This is a senior technical role focused on Site Reliability Engineering for critical enterprise applications and platforms. The role combines hands-on production support, observability, incident prevention, release reliability, automation, operational resilience, and support for compliance and regulatory expectations in enterprise environments.
The position supports high-impact incident response, improves operational standards, mentors onshore and offshore engineers, and communicates clearly with both technical and business stakeholders. It is a strong fit for someone who wants to improve reliability, reduce operational risk, and scale support through automation and better engineering practices.

Key Responsibilities

Lead production support, operational readiness, and reliability risk management for critical services and dependencies. Manage major incident triage, escalation, recovery, stakeholder communications, and closure activities, including coordination through Remedy or similar enterprise ticketing and incident management tools, with execution aligned to SLAs.
Work closely with Development and Business Product Owner teams to align reliability priorities, release readiness, and incident communication; identify SLIs, determine SLOs, and plan remediations aligned to business outcomes.
Improve observability through dashboards, alerting, event correlation, and trend-based early warning. Support release reliability through deployment validation, rollback preparedness, readiness checks, and post-release verification.
Build and maintain automation using Python, Bash, Windows Batch scripting, and PowerShell to standardize support processes, improve recovery actions, create reusable solutions, and reduce toil through automation.
Develop automation for monitoring, deployment validation, routine operational tasks, recovery procedures, incident response workflows, and process efficiency improvements.
Support disaster recovery planning, zonal isolation planning and execution, recovery testing, certificate-related operational needs, and secure production readiness.
Support compliance and regulatory requirements through disciplined operational controls, documentation, and reliable execution.
Use GitHub and other software configuration management tools for source control, collaboration, workflow support, and governance.
Apply security knowledge and access grouping concepts to support secure operations, platform access controls, and operational readiness.
Mentor engineers on troubleshooting, automation, SRE and observability disciplines, and cross-time-zone handoffs, and contribute to architecture reviews to improve operability, resilience, and maintainability.

What you have

Required Qualifications

Strong experience in Site Reliability Engineering, observability, production support, and enterprise platform operations.
Proven experience managing major incidents, root cause analysis, service account or password restoration, and operational risk reduction in complex production environments with strong SLA-driven execution.
Strong hands-on experience with Splunk or similar monitoring and observability platforms.
Strong troubleshooting skills across applications, infrastructure, platforms, networking, databases, storage, and integrated service dependencies.
Strong scripting and automation skills using Python, Shell/Bash, Windows Batch, and PowerShell to improve operational support, monitoring, deployment validation, recovery procedures, and repetitive task reduction.

One year of Schwab technology domain experience gained as a current or recent contractor or employee
Experience building reusable automation solutions that improve consistency, reduce manual effort, and reduce toil through automation.

Experience with GitHub and other software configuration management tools. Experience in build and release management, CI/CD practices, deployment controls, and release reliability processes.
Experience supporting applications on PCF and operating in distributed production environments.
Working knowledge of resiliency and recovery, including HA patterns, zonal isolation, failover/failback, RTO/RPO, recovery testing, and post-recovery validation, plus provide operational support including the ability to read and

write SQL queries for troubleshooting and data validation.
Familiarity with Jira and Scrum concepts, along with experience using Remedy or similar enterprise incident and ticket management platforms.
Understanding of security concepts and grouping models, including access controls, security groups, role-based access, or similar enterprise authorization practices.
Strong written and verbal communication skills, including the ability to explain technical issues, risks, and remediation plans to technical and business audiences.

Preferred Qualifications

Familiarity with Docker, Linux and Windows production environments, and high-availability distributed systems.
Experience with AppDynamics, Grafana, ThousandEyes, ExtraHop, or similar observability and performance monitoring tools.
Experience designing automation for alert correlation, deployment validation, recovery actions, and operational handoff workflows.
Familiarity with enterprise storage models, including NAS covering access control and permissions, encryption, storage quotas, retention/lifecycle controls, and operational troubleshooting.
Experience supporting disaster recovery exercises, zonal resilience strategies, and post-recovery validation.
Familiarity with MSSQL or Oracle, certificate lifecycle processes, secure transport, and enterprise operational controls.
Understanding of business workflow concepts, including upstream/downstream dependencies, client-request SLAs, and failure impact, and the ability to map reliability issues to end-to-end business outcomes.
Experience working across onshore and offshore support teams and in financial services or other highly regulated environments

Job Sub-Family Specific Competencies

Application Maintenance and Support - Delivering effective management and technical services to address technical issues and minimize disruption to application users
Incident ResponseIncident ResponseIncident ResponseApplication Maintenance and Support - Delivering effective management and technical services to address technical issues and minimize disruption to application users
Incident Response - Resolving reported incidents through streamlined processes, minimizing disruptions, and promptly restoring services
Software Design and Specifications - Developing software solutions that meet requirements using established design principles and standards, employing predictive or adaptive design techniques, including plan-driven or iterative/agile approaches
Software Development - Implementing standards, processes, and methods to create, test, and verify software components, ensuring reliability and resolving operational problems and bugs
Software Release and Deployment - Managing the deployment of software updates while ensuring compliance with safety, security, and quality standards
Strategic Thinking - Analyzing an organization's competitive position and developing a clear and compelling vision of what the organization needs for success in the future

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90989465
Position Id: c005a17b3c100adab3563747ee8817eb
Posted 4 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Senior Site Reliability Engineer

Coppell, Texas

•

2d ago

Senior Site Reliability Engineer combination of deep operational expertise and hands-on engineering ability. The majority of your time (~70%) will be focused on owning the reliability, availability, scalability, and operational excellence of the cloud infrastructure and SaaS platforms powering our business. The remaining ~30% puts you directly in the platform engineering flow: building automation, improving deployment pipelines, and driving reliability initiatives from conception through produc

Easy Apply

Full-time

Depends on Experience

Sr. Software Engineer

Coppell, Texas

•

22d ago

About Blackhawk Network: Today, through BHN's single global platform, businesses of all kinds can tap into the world's largest network of branded payment solutions. BHN helps businesses grow revenue, increase loyalty, motivate and reward their teams, disburse funds and engage consumers. Branded payment solutions include the issuance and distribution of gift cards, egifts, corporate payouts and rewards, along with the technology to deliver these products in seamless, integrated ways. BHN's networ

Full-time

Director Systems Engineering

Hybrid in Coppell, Texas

•

Today

Are you ready to make an impact at DTCC? Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional development? At DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world

Full-time

Sr. Manager, Software Development & Engineering

Southlake, Texas

•

Today

Your Opportunity At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together. Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning. The Reference Data Platform (RDP) r

Full-time

USD 105,600.00 - 234,600.00 per year

Search all similar jobs