Machine Learning Systems Engineer: Distributed Training

Overview

On Site
Full Time

Skills

Scalability
Optimization
Collaboration
Computer Hardware
Training
Software Engineering
Python
Distributed Computing
Benchmarking
Trading
Innovation
Machine Learning (ML)
Artificial Intelligence
Research
Recruiting

Job Details

Overview

We're looking for a Machine Learning Systems Engineer to strengthen the performance and scalability of our distributed training infrastructure. In this role, you'll work closely with researchers to streamline the development and execution of large-scale training runs, helping them make the most of our compute resources. You'll contribute to building tools that make distributed training more efficient and accessible, while continuously refining system performance through careful analysis and optimization. This position is a great fit for someone who enjoys working at the intersection of distributed systems and machine learning, values high-performance code, and has an interest in supporting innovative machine learning efforts.

What You'll Do
  • Collaborate with researchers to enable them to develop systems-efficient models and architectures
  • Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
  • Create tooling to help researchers distribute their training jobs more effectively
  • Profile and optimize our training runs

What we're looking for
  • Experience with large-scale ML training pipelines and distributed training frameworks
  • Strong software engineering skills in python
  • Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability
  • Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations


Why Join Us?

Susquehanna is a global quantitative trading firm that combines deep research, cutting-edge technology, and a collaborative culture. We build most of our systems from the ground up, and innovation is at the core of everything we do. As a Machine Learning Systems Engineer, you'll play a critical role in shaping the future of AI at Susquehanna - enabling research at scale, accelerating experimentation, and helping unlock new opportunities across the firm.

If you're a recruiting agency and want to partner with us, please reach out to . Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.