Overview
On Site
Full Time
Skills
Scheduling
Banking
Wealth Management
High Availability
Job Scheduling
Streaming
Log Shipping
Data Warehouse
IT Operations
Real-time
Management
Migration
Business Continuity Planning
Conflict Resolution
Problem Solving
Linux
System Administration
Computer Networking
TCP/IP
Routing
Firewall
Distributed Computing
Cloud Computing
Python
Erlang
Financial Services
Job Details
Job Description
Procmon Platform delivers a highly scalable and reliable ecosystem for scheduling business critical jobs across Goldman Sachs.
Our platform is responsible for scheduling tens of millions of daily jobs for Global Banking & Markets, Asset & Wealth Management, Risk and other business and engineering functions.
The ecosystem includes a number of high availability, very large scale systems including
RESPONSIBILITIES
REQUIREMENTS
Procmon Platform delivers a highly scalable and reliable ecosystem for scheduling business critical jobs across Goldman Sachs.
Our platform is responsible for scheduling tens of millions of daily jobs for Global Banking & Markets, Asset & Wealth Management, Risk and other business and engineering functions.
The ecosystem includes a number of high availability, very large scale systems including
- Job scheduling
- Event streaming
- Log shipping
- Data warehouses
- Security infrastructure
RESPONSIBILITIES
- Own technical operations for systems that manage hundreds of thousands of compute cores
- Build observability for new deployments to ensure robustness from day one, as well as mature deployments to identify and implement improvements
- Troubleshoot and resolve issues with block devices, file descriptors, and packet loss
- Lead real-time outage investigations and present postmortems to senior management
- Define SLIs and SLOs and partner with development teams to ensure system are sufficiently well designed and instrumented
- Partner with our development team throughout development and operations
- Plan and manage deployments and migrations (including end-of-life programs)
- Plan and implement robust business continuity and security programs
- Provide regional coverage for the Procmon platform and participate in the on-call support
REQUIREMENTS
- Excellent problem-solving and automation skills
- Strong Linux fundamentals and system administration skills
- Good networking fundamentals (familiarity with TCP/IP, IP routing, firewalls, secure tunneling protocols)
- Experience working with distributed computing systems and Cloud computing environments
- Proficiency in at least one programming language; the team uses a mix of Go, Python and Erlang
- Able to operate effectively in a mission critical, highly regulated financial services environment
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.