Engineering - SRE Platforms - Software Engineer - Vice President - Dallas

Overview

On Site
Full Time

Skills

Incident Management
Capacity Management
Collaboration
Service Level
Computer Science
Reliability Engineering
Leadership
Management
Cloud Computing
Amazon Web Services
Microsoft Azure
Terraform
Docker
Kubernetes
Conflict Resolution
Problem Solving

Job Details

Job Description

Goldman Sachs is seeking a talented and motivated Site Reliability Engineering Manager to join our team. As a leader within the firm's Technology division, you will be responsible for overseeing the Site Reliability Engineering (SRE) function, ensuring the stability and reliability of critical applications and infrastructure. You will manage a team of SRE engineers who work closely with developers, infrastructure engineers, and operations teams to build and maintain highly available systems.

Key Responsibilities:

Manage a team of Site Reliability Engineers responsible for ensuring the reliability, availability, and performance of critical applications and infrastructure

Develop and implement best practices for Site Reliability Engineering, including incident management, monitoring, automation, and capacity planning

Collaborate with development teams to design and build highly available and scalable systems

Work with infrastructure teams to ensure that critical infrastructure components are operating optimally and are able to support the needs of the business

Develop and maintain Service Level Agreements (SLAs) and Service Level Objectives (SLOs) to ensure that critical systems meet the needs of the business

Manage and prioritize workload for the SRE team, ensuring that they are aligned with business priorities

Develop and maintain relationships with key stakeholders across the organization to ensure that the SRE function is aligned with business goals

Qualifications:

Bachelor's degree in Computer Science, Engineering, or related field

8+ years of experience in Site Reliability Engineering, with at least 3 years in a management role

Strong leadership skills with the ability to manage a team of engineers

Experience with cloud computing platforms such as AWS or Azure

Experience with infrastructure as code (IaC) tools such as Terraform or CloudFormation

Experience with containerization technologies such as Docker and Kubernetes

Strong problem-solving skills with the ability to troubleshoot complex issues
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.