HPC Infrastructure/Network Engineer

Overview

On Site
$60 - $70
Contract - W2
Contract - 8 Month(s)

Skills

Network Engineer
HPC (High-Performance Computing)
switches
routers
performance tuning
Python
Shell
InfiniBand
network automation
network security

Job Details

Job Title: HPC Infrastructure/Network Engineer

Location: Ashburn, VA - Onsite

Duration: 6 to 8 months

Role Overview:

The HPC (High-Performance Computing) Role focuses on planning, implementing, and managing InfiniBand network configurations for high-performance computing in data centers. The role emphasizes network and physical network troubleshooting (e.g., NIC testing, Ixia-enabled testing), with a skill distribution of 60% network, 30% Linux + CI/CD, and 10% HPC. Responsibilities include configuring switches, routers, and adapters, implementing security protocols, monitoring performance, troubleshooting, collaborating with vendors, and developing automation scripts.

Key Responsibilities:

  • Configure and manage InfiniBand networks, including switches, routers, adapters, and performance tuning (e.g., MTU, buffer sizes, PFC/DCB for congestion management).
  • Conduct physical network troubleshooting (e.g., NIC testing, Ixia-enabled testing for performance validation).
  • Develop automation scripts (Python, Shell) for network tasks, leveraging libraries like Netmiko, NAPALM, Jinja; Ansible a plus.
  • Monitor performance using tools like EPM/IPM; implement security protocols (MACsec, IPsec, access controls).
  • Collaborate with vendors for compatibility, POCs, and BOMs; support lab/pre-field testing.
  • Document configurations and processes via MOP/SOP.

Qualifications:

  • Bachelor s degree in Computer Science, IT, or related field.
  • 5+ years of InfiniBand experience in enterprise/lab environments.
  • Expertise in InfiniBand architecture, protocols; RoCE a plus.
  • Proficient in Python, Shell scripting (junior developer level, 1 2 years) for network automation; Git experience preferred.
  • Strong network security (MACsec/IPsec), troubleshooting, and performance tuning skills.
  • Familiarity with RDMA applications, parallel computing frameworks (e.g., MPI, OpenMP).
  • Certifications (e.g., IBTA, CCNP) a plus; Linux/UNIX proficiency and CI/CD mindset required.

Skill Distribution (60/30/10):

  • 60% Network: Emphasis on InfiniBand troubleshooting, NIC testing, Ixia-enabled testing, and performance tuning (e.g., PFC/DCB, MTU).
  • 30% Linux + CI/CD: Linux/UNIX administration, Python/Shell scripting for automation, CI/CD familiarity (Git/Jenkins).
  • 10% HPC: Basic HPC cluster knowledge, RDMA applications, parallel computing (MPI/OpenMP).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.