High Performance Computing Specialist

• Posted 2 days ago • Updated 2 days ago
Full Time
On-site
Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

  • Trading
  • HPC
  • Performance Tuning
  • Scheduling
  • Artificial Intelligence
  • Use Cases
  • Firewall
  • Computer Hardware
  • Scalability
  • Reliability Engineering
  • Capacity Management
  • Incident Management
  • Provisioning
  • Lifecycle Management
  • Scripting
  • Computer Science
  • High Performance Computing
  • Machine Learning (ML)
  • Training
  • Management
  • Linux
  • Storage
  • GPU
  • Kubernetes
  • Terraform
  • Ansible
  • Computer Networking
  • TCP/IP
  • HTTP
  • Load Balancing
  • Python
  • Shell Scripting
  • Grafana

Summary

An elite Montreal based Trading Firm is seeking an HPC Systems Specialist to join a team responsible for designing and operating high performance GPU platforms that support advanced AI and machine learning workloads. This role sits at the intersection of infrastructure engineering, distributed systems, and performance tuning, with ownership spanning from physical hardware through large?scale model serving. You will work closely with ML practitioners and infrastructure peers to build reliable, scalable, and highly optimized compute environments.

What You'll Do

  • Build, operate, and continuously improve GPU-based compute platforms supporting large-scale inference and ML workloads
  • Design and deploy distributed model serving architectures across multi-node, multi-GPU environments
  • Operate and evolve Kubernetes clusters with GPU scheduling for AI and ML use cases
  • Configure and tune networking components such as load balancers, firewall rules, and high-throughput interconnects for GPU clusters
  • Develop and optimize storage solutions for model artifacts, checkpoints, and inference caches
  • Diagnose and resolve performance and stability issues across hardware, drivers, networking, and application layers
  • Partner with ML engineers to benchmark models, analyze performance characteristics, and apply inference acceleration strategies
  • Evaluate new GPU hardware, serving frameworks, and infrastructure patterns to improve efficiency and scalability
  • Improve system reliability through observability, alerting, capacity planning, and on-call/incident response processes
  • Automate provisioning and lifecycle management using infrastructure-as-code and scripting


What You Bring

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related discipline
  • 5+ years of experience in managing high performance computing environments
  • Hands-on experience operating GPU compute environments for ML inference or training
  • Familiarity with modern model serving frameworks (e.g., vLLM, SGLang, or similar) and GPU driver/runtime management
  • Strong Linux systems expertise, including networking, storage, and kernel-level performance considerations
  • Practical experience running GPU workloads on Kubernetes at scale
  • Experience with infrastructure automation tools such as Terraform, Ansible, or equivalent
  • Solid understanding of distributed systems concepts, networking fundamentals (TCP/IP, HTTP/2), and load-balancing strategies
  • Proficiency in Python and shell scripting for tooling and automation
  • Experience with monitoring and observability platforms such as Prometheus, Grafana, or comparable tools

This is a hybrid role in the firms Montreal office requiring 3 days per week onsite, and 2 days remote.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90922487
  • Position Id: 24054010
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Chicago, Illinois

Today

Easy Apply

Full-time

$110000 - $122000

Chicago, Illinois

12d ago

Full-time

USD 110,000.00 - 160,000.00 per year

Remote or Chicago, Illinois

Today

Full-time

Chicago, Illinois

2d ago

Full-time

USD 90,700.00 - 135,000.00 per year

Search all similar jobs