Overview
On Site
$100,000 - $130,000
Full Time
Skills
HPC
Linux
SLURM
Job Details
HPC Systems Administrator
*This role is Fully Onsite in Dallas, TX*
About the Role
We are seeking a Senior High-Performance Computing (HPC) Systems Administrator to support a world-class research computing environment. In this role, you will design, deploy, maintain, and optimize HPC systems in a Linux environment. You ll work closely with researchers and technical teams to deliver reliable and scalable compute resources that power groundbreaking research initiatives.
This is a hands-on, in-person position that requires advanced expertise in Linux systems administration, HPC clusters, and modern tools for storage, networking, and automation.
Key Responsibilities
- Design, implement, and manage HPC cluster environments including node provisioning, performance tuning, and software deployment.
- Maintain and troubleshoot parallel file systems (e.g., Lustre) and high-speed interconnects (e.g., InfiniBand).
- Administer job schedulers and workload managers such as SLURM.
- Automate system administration tasks using shell scripts or languages like Python or Perl.
- Work directly with researchers to compile, install, and optimize open-source and commercial applications.
- Coordinate with hardware/software vendors to resolve issues and ensure systems remain up-to-date.
- Maintain clear, detailed documentation of configurations, processes, and procedures.
- Participate in on-call rotations and maintenance windows as part of a small infrastructure team.
Required Qualifications
- Bachelor s degree in a related technical field.
- 5+ years of Linux system administration experience in a research or large-scale computing environment.
- Proficiency with SLURM or similar job schedulers.
- Experience supporting high-performance networking technologies like InfiniBand.
- Hands-on experience with Lustre or other parallel file systems.
- Scripting proficiency with Bash, Python, or Perl.
- Strong troubleshooting and problem-solving skills.
- Excellent communication skills to support users with varied technical backgrounds.
Preferred Qualifications
- Experience with Nvidia DGX systems.
- Familiarity with Bright Cluster Manager.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.