Senior Software Engineer - HPC Scheduler

Overview

Full Time

Skills

Asset Management
High Performance Computing
Distribution
Reliability Engineering
Distributed Computing
Management
Workflow
Job Scheduling
Algorithms
HPC
Software Development
Programming Languages
Systems Design
Computer Science
Python
Kubernetes
Cloud Computing
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Google Cloud
Orchestration
Workflow Management
Finance
Trading

Job Details

ROLE OVERVIEW:

Balyasny Asset Management is seeking an experienced Software Engineer to join our Infrastructure team, focusing on the development and maintenance of BAM's High-Performance Computing (HPC) infrastructure. This role involves building and supporting our proprietary distributed task orchestration platform that manages large-scale job distribution across cloud and on-premises infrastructure. The successful candidate will develop distributed computing libraries, ensure system reliability, and support critical production workloads.

RESPONSIBILITIES:
Design and maintain distributed computing libraries and APIs
Build robust task orchestration systems managing thousands of concurrent workflows
Implement job scheduling algorithms for efficient resource utilization
Ensure reliability of critical HPC infrastructure through flexible support coverage
Develop monitoring and observability solutions for distributed systems
Drive automation initiatives to reduce operational overhead
Participate in on-call rotation to meet 24/7 support SLAs

QUALIFICATIONS & REQUIREMENTS:
5+ years of professional software development experience
Strong proficiency in at least two programming languages
Experience with distributed systems design and implementation
Understanding of containerization and orchestration platforms
Strong ownership mentality and commitment to production reliability
Degree in Computer Science, Engineering, or related field
Willingness to participate in support rotation

NICE TO HAVE:
Experience with Python and parallel compute libraries (Dask, Ray, Joblib)
Knowledge of Kubernetes and cloud platforms (AWS, Azure, Google Cloud Platform)
Experience with task orchestration systems and workflow management
Understanding of financial markets and trading systems
Experience with monitoring tools and distributed tracing
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.