HPC Cluster Engineer

Overview

Remote

Depends on Experience

Contract - W2

Contract - 12 Month(s)

No Travel Required

Unable to Provide Sponsorship

Skills

HPC

GPU

Data center

Job Details

Job Title: HPC Cluster Engineer
Location: Remote
Role Summary

We are seeking an HPC Cluster Engineer to support and maintain high-performance computing environments, with strong hands-on experience in hardware, data center operations, networking, and GPU systems.

Key Responsibilities

Install, configure, and maintain HPC compute clusters
Manage server hardware, racks, cabling, power, and cooling
Perform GPU installation, maintenance, and troubleshooting (NVIDIA)
Support high-speed networking (InfiniBand/Ethernet)
Monitor cluster health and resolve hardware/network issues
Work closely with data center and infrastructure teams

Required Skills

Experience with HPC / compute clusters
Strong Linux administration (RHEL / Ubuntu / CentOS)
Hands-on server hardware & data center operations
Networking knowledge (InfiniBand, Mellanox, RDMA, Ethernet)
GPU systems (A100/H100/V100 preferred)

Nice to Have

Job schedulers: Slurm, PBS, LSF
HPC storage: Lustre, GPFS, BeeGFS
Cloud or hybrid HPC exposure

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share