HPC System Administrator

Overview

Hybrid
$100,000 - $125,000
Full Time

Skills

SLURM
High Performance Computing
Computer engineering
Computer hardware
Research
Science
Performance monitoring

Job Details

As an HPC Systems Administrator, you will be a key member of a team that provides high-end research computing resources to researchers at a world-class university and research organization.
The team is dedicated to enabling research by providing access to centrally managed High Performance Computing (HPC), storage, and visualization resources. These resources include hardware, software, high-level scientific and technical user support, and the education and training required to help researchers make full use of modern HPC technology and local and national supercomputing resources.
You'll oversee day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components. Also you'll manage the system s network switch, parallel file system and HPC software stack and tools.

To be successful you should have prior experience:

  • Experience supporting HPC compilers and libraries.
  • Installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.).
  • Configuring, installing and troubleshooting MPI and OpenMP.
  • Hands-on experience of at least one distributed file system (Spectrum Scale-GPFS, Lustre, BeeGFS, Gluster, IMRIX, PVFS, etc.).
  • Operating system deployment tools (e.g. XCAT, ROCKS).
  • Configuring, administering, and supporting network storage subsystems (e.g. IBM, NetAppl DataDirect Network, LSI, etc.).
  • Direct experience working with Infiniband (must at least be able to demonstrate a working knowledge of Infiniband concepts, OFED layers, sub-net managers).
  • Configuring, installing, tuning and maintaining scientific application software on large-scale systems.
  • Experience with systems automation tools such as Ansible or Puppet.
  • Configuring, installing, maintaining and/or using performance monitoring and optimization tools.
  • Degree in Computer Engineering/Science or related field is required
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.