HPC Cluster Engineer

Overview

Remote
Depends on Experience
Contract - W2
Contract - 12 Month(s)
No Travel Required
Unable to Provide Sponsorship

Skills

HPC
GPU
Data center

Job Details

Job Title: HPC Cluster Engineer
Location: Remote
Role Summary

We are seeking an HPC Cluster Engineer to support and maintain high-performance computing environments, with strong hands-on experience in hardware, data center operations, networking, and GPU systems.

Key Responsibilities

  • Install, configure, and maintain HPC compute clusters
  • Manage server hardware, racks, cabling, power, and cooling
  • Perform GPU installation, maintenance, and troubleshooting (NVIDIA)
  • Support high-speed networking (InfiniBand/Ethernet)
  • Monitor cluster health and resolve hardware/network issues
  • Work closely with data center and infrastructure teams

Required Skills

  • Experience with HPC / compute clusters
  • Strong Linux administration (RHEL / Ubuntu / CentOS)
  • Hands-on server hardware & data center operations
  • Networking knowledge (InfiniBand, Mellanox, RDMA, Ethernet)
  • GPU systems (A100/H100/V100 preferred)

Nice to Have

  • Job schedulers: Slurm, PBS, LSF
  • HPC storage: Lustre, GPFS, BeeGFS
  • Cloud or hybrid HPC exposure
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.