Linux Administrator (HPC and NVIDIA GPU) | 100% Remote | (#CH) (#ANB)

Overview

Remote
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - W2

Skills

GPU
HPC
Linux
Kubernetes
slurm

Job Details

System Administrator High-Performance Computing (HPC)

Location: Remote

DURATION: 6 to 12 Months

Key Responsibilities

System Support & Troubleshooting

  • Provide operational support and problem resolution for the NVIDIA NVL72 GPU system.
  • Monitor system health and performance, proactively identifying and resolving issues to maintain high uptime and availability. Cluster & Workload Management
  • Administer and optimize the Slurm workload manager for efficient job scheduling and resource allocation.
  • Manage container orchestration using Kubernetes (K8s) within the HPC environment. Software & Patch Management
  • Maintain and update NVIDIA software stacks, ensuring proper patch management, version control, and security compliance.
  • Utilize NVIDIA Base Command Manager for system orchestration, monitoring, and optimization. Documentation & Knowledge Transfer
  • Author and maintain detailed technical documentation, including system architecture, configurations, and operational procedures.
  • Create clear, user-friendly How To guides to support onboarding and self-service among researchers and staff.
  • Conduct on-the-job training sessions for new team members and end users to facilitate knowledge transfer and best practices. Qualifications

Required

  • Bachelor s degree in Computer Science, Engineering, or a related field, or equivalent professional experience.
  • 3 5 years of experience in system administration, preferably in HPC or GPU-accelerated environments.
  • Proficiency in Linux, Slurm, Kubernetes, and NVIDIA GPU technologies.
  • Demonstrated experience writing technical documentation and user
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.