Senior Manager, SRE Platform Support

  • Austin, TX
  • Posted 18 hours ago | Updated 6 hours ago

Overview

On Site
Full Time

Skills

Accountability
Scalability
Problem Management
Service Level
Reporting
KPI
Customer Satisfaction
Continuous Improvement
Operational Excellence
Training And Development
Software Development
Software Development Methodology
People Management
Resource Allocation
Coaching
Decision-making
Roadmaps
Regulatory Compliance
Information Security
Auditing
Disaster Recovery
Vendor Relationships
IaaS
PaaS
SQL Azure
API Management
Analytics
Dashboard
Scripting
Windows PowerShell
Python
Bash
GitHub
Performance Monitoring
Software Performance Management
Incident Management
Root Cause Analysis
Team Building
Mentorship
Conflict Resolution
Problem Solving
Analytical Skill
ROOT
Organizational Skills
Kanban
C#
.NET
TypeScript
React.js
Database
Microsoft SQL Server
Cosmos-Db
Data Engineering
Service Management
System Integration
SSO
OAuth
Computer Science
Information Technology
IT Operations
Software Engineering
High Availability
Customer Facing
SaaS
Management
Application Support
Git
Version Control
Continuous Integration
Continuous Delivery
Financial Services
Budget
Vendor Management
Capacity Management
Cloud Computing
DevOps
Microsoft Azure
ITIL
Reliability Engineering
Agile
Scrum
Project Management
PMP
Typing
Writing
Cabling
UI
Communication
Active Listening
Bridging
Presentations
System Integration Testing
Laptop
Servers
Internet
SAFE
SAP BASIS
Law
IT Service Management
Innovation
Collaboration
Recruiting
Insurance
Finance
Professional Development
Training
Leadership
CompTIA
Customer Service
Career Counseling
Oracle Application Express
Apex

Job Details

Job#: 2075398

Job Description:

Job Description

Senior Manager, SRE Platform Support

Location: Austin, Texas (South)

Schedule: 4 days onsite

Type: Direct hire

ESSENTIAL DUTIES AND RESPONSIBILITIES: To perform this job successfully, this individual must be able to

perform each essential duty satisfactorily:

Lead, mentor, and develop a high-performing, combined team of Application Support engineers and Site

Reliability Engineers, fostering a culture of collaboration, continuous learning, and accountability.

Drive the strategy and execution for platform reliability, scalability, and performance, implementing and

championing SRE best practices including incident response, blameless post-mortem analysis, error budgets,

and service reliability monitoring.

Oversee day-to-day operations of the Advisor Platform Support teams, ensuring timely resolution of

incidents, proactive problem management, adherence to service level objectives (SLOs), and exceptional user

support.

Establish, track, and report on key performance indicators (KPIs) and SLOs for platform reliability,

availability, performance, and customer satisfaction, driving continuous improvement initiatives.

Champion operational excellence through automation of toil, robust monitoring strategies, proactive problem

resolution, and the development of comprehensive runbooks.

Collaborate closely with Business Support Teams, Advisors, Home Office Staff, development, product, and

infrastructure teams to understand support needs, ensure new features are supportable, and align on strategic

objectives.

Participate in on-call rotation leadership and ensure proper escalation procedures are in place for 24x7

incident management; drive efficient incident management processes and ensure timely resolution of platform

issues.

Partner with development teams to embed operability, reliability, and supportability best practices into the

software development lifecycle (SDLC).

Manage team capacity, resource allocation, and hiring plans to support business growth and strategic

initiatives.

Provide regular, constructive feedback, coaching, and career development opportunities to team members,

addressing complex team challenges and fostering professional growth.

Synthesize functional insights from support operations and reliability engineering to guide team decisionmaking and contribute to departmental strategy and technology roadmaps.

Advocate for core values, cultivating an environment of trust, mutual respect, psychological safety,

open communication, and empathetic listening.

Ensure compliance with information security, audit, and disaster recovery requirements relevant to a regulated

financial services environment.

Manage vendor relationships, support contracts, and SRE tooling budgets as needed.

Champion Agile development practices (Scrum/Kanban) and promote effective collaboration across teams,

ensuring adherence to CI/CD practices using Azure DevOps and GitHub.

KNOWLEDGE, SKILLS, AND/OR ABILITIES: To perform this job successfully, individuals should have the

following skills and abilities:

Advanced understanding of Microsoft Azure cloud services, including IaaS, PaaS, and SaaS offerings (e.g.,

Azure App Service, Function Apps, Container Apps, Azure SQL, Cosmos DB, Azure Service Bus, Vnets,

Azure Monitor/Log Analytics, API Management, Front Door, Application Gateway, CDN).

Strong expertise in monitoring, observability, and telemetry tools and practices (e.g., Azure Application

Insights, Log Analytics, KQL, alerting dashboards).

Proficiency in scripting and automation languages (e.g., PowerShell, Python, Bash).

Deep knowledge of Site Reliability Engineering (SRE) principles and practices (e.g., SLIs/SLOs, error

budgets, blameless post-mortems, automation, proactive monitoring).

Experience with CI/CD pipelines, Git-based source control (Azure DevOps, GitHub), infrastructure-as-code

(IaC) tools, and automated deployment strategies.

Strong understanding of application performance monitoring (APM) and troubleshooting techniques.

Proven experience with incident management, root cause analysis, and post-mortem processes.

Excellent leadership and team development capabilities, with a passion for mentoring and growing technical

talent.

Exceptional problem-solving, analytical, and troubleshooting skills, with the ability to guide teams in

analyzing root causes and implementing effective solutions.

Strong project management, prioritization, and organizational skills, with the ability to manage multiple tasks

autonomously in a dynamic environment.

Excellent communication (written and verbal) and interpersonal skills, with the ability to translate technical

concepts for non-technical stakeholders and adapt communication style effectively.

Experience with Agile development methodologies (Scrum/Kanban) and working collaboratively in an Agile

environment.

Knowledge of backend technologies such as C# and .NET (including .NET Framework 4.x and .NET 8+).

Understanding of frontend technologies like TypeScript and React. (Desired) (Desired)

Experience with database technologies (e.g., MS SQL Server, Cosmos DB) and data engineering concepts.

(Desired) (Desired)

Familiarity with ITIL or similar service management frameworks, particularly incident, problem, and change

management. (Desired) (Desired)

Experience with third-party software integration via APIs, SSO, and File Transfer. (Desired)

Working knowledge of security best practices and SSO/OAuth flows. (Desired)

EDUCATION AND/OR EXPERIENCE:

Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical field.

(Master's degree is a plus)

7-10 years of progressive experience in IT operations, platform support, software engineering, or site

reliability engineering.

Minimum 3-5 years of management experience leading and developing multi-disciplinary technical teams

(e.g., Application Support, SRE, Operations).

Proven track record of operating and scaling high-availability, customer-facing SaaS platforms, preferably on

Microsoft Azure.

Demonstrated success implementing SRE principles, Agile/DevOps practices, and improving operational

processes within technical teams.

Experience managing both application support and infrastructure/SRE functions is highly desirable.

Experience with Azure DevOps, Git version control, and CI/CD pipelines.

Background in financial services or other highly regulated industries. (Desired)

Experience with budgeting, vendor management, and capacity planning for cloud operations. (Desired)

CERTIFICATIONS, LICENSES, REGISTRATIONS:

Microsoft Azure certifications (e.g., Azure Administrator Associate, Azure DevOps Engineer Expert, Azure

Solutions Architect Expert). (Desired)

ITIL Foundation or higher certification. (Desired)

Site Reliability Engineering (SRE) related certifications. (Desired)

Agile/Scrum certifications (e.g., Certified ScrumMaster (CSM), SAFe Agilist). (Desired)

Project Management certification (e.g., PMP). (Desired)

PHYSICAL DEMAND: The physical demands described here are representative of those that must be met by an

employee to successfully perform the essential functions of this job. Reasonable accommodation may be made to

enable individuals with disabilities to perform the essential functions.

Ability to sit or stand at a computer workstation for extended periods while using a keyboard, mouse, and

multiple monitors.

Frequent, repetitive hand-finger motions for typing, writing, and handling small peripherals or cables.

Near-vision sufficient to read electronic documents, review code, and distinguish basic on-screen colors (e.g.,

for UI verification).

Clear spoken communication and active listening for in-person and virtual meetings, incident bridges, and

phone calls.

Ability to walk short distances, navigate a standard office environment, climb one flight of stairs, and stand

during white-boarding or presentations. Sit-stand desks and other ergonomic furniture are available upon

request.

Ability to lift and move equipment or boxed materials weighing up to 20 lbs (e.g., laptops, small servers,

office supplies).

Hybrid roles: Primary work performed on-site at the Encino Trace campus (Southwest Austin, TX) Monday-

Thursday; optional remote work on Fridays.

Fully remote roles: Primary work performed from the employee's home office within approved locations;

reliable high-speed internet and an ergonomically safe workspace are required.

Participation in overnight or weekend on-call rotations and critical production releases may require work

outside standard business hours.

OTHER DUTIES: Please note this job description is not designed to cover or contain a complete comprehensive

listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and

activities may change at any time with or without notice.

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico.

Apex Benefits Overview: Apex offers a range of supplemental benefits, including medical, dental, vision, life, disability, and other insurance plans that offer an optional layer of financial protection. We offer an ESPP (employee stock purchase program) and a 401K program which allows you to contribute typically within 30 days of starting, with a company match after 12 months of tenure. Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts. In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Apex has a dedicated customer service team for our Consultants that can address questions around benefits and other resources, as well as a certified Career Coach. You can access a full list of our benefits, programs, support teams and resources within our 'Welcome Packet' as well, which an Apex team member can provide.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Apex Systems