Apply Now

HPC Systems Engineer

Charlottesville, VA, US • Posted 30+ days ago • Updated 10 hours ago

Full Time

On-site

SAIC

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

High Performance Computing
Computational Science
Performance Analysis
Storage
Network
Provisioning
xCAT
Scheduling
GRID
Workflow
Linux Administration
Red Hat Enterprise Linux
Parallel Computing
MPI
OpenMP
Docker
Remote Direct Memory Access
InfiniBand
Computer Networking
Communication
Writing
System Administration
Science
Security Clearance
Linux
Command-line Interface
Interfaces
Scripting
Bash
Python
Management
HPC
Distributed File System
IBM GPFS
GPU
CUDA
Configuration Management
Ansible
Puppet
Research
DoD
Internal Communications
IC
Integrated Circuit
Information Technology
Systems Engineering
FOCUS

Summary

Job ID: 2610670

Location: Charlottesville, VA, US

Date Posted: 2026-03-26

Category: Engineering and Sciences

Subcategory: Systems Engineer

Schedule: Full-Time

Shift: Day Job

Travel: No

Minimum Clearance Required: Top_Secret

Clearance Level Must Be Able to Obtain: TS/SCI

Potential for Remote Work: ORA_ON_SITE

Description

SAIC is looking for a highly qualified HPC Systems Engineer to support the Army's Golden Dome initiative. The engineer will support the deployment and sustainment of Linux-based High Performance Computing (HPC) cluster environments used for distributed compute workloads, simulation environments, and GPU-enabled processing.

The environment will include:

multi-node Linux compute clusters
workload scheduling platforms such as Slurm or PBS
cluster provisioning frameworks (e.g., xCAT, Warewulf)
high-performance networking technologies including RDMA / InfiniBand
distributed parallel compute workloads utilizing MPI or OpenMP
GPU-enabled compute resources supporting CUDA-based processing

The system will be used to support scientific computing, simulation workloads, and other distributed compute operations within a secure research environment.

Candidates should be comfortable working within cluster-scale computing environments where performance, scheduler configuration, and distributed workload execution are critical operational factors.

The HPC Systems Engineer will support the build-out, configuration, and sustainment of HPC cluster platforms.

The role focuses on:

cluster platform configuration
scheduler administration
distributed compute troubleshooting
performance analysis across compute, storage, and network layers
GPU compute workload support
automation and operational tooling

Candidates should have experience working with multi-node Linux cluster environments and distributed compute workloads.

Core Technical Capabilities

Candidates should demonstrate capability in most of the following areas.

HPC Cluster Platforms

Experience supporting multi-node Linux compute clusters, including node integration, configuration, and operational sustainment.

Experience with cluster provisioning tools such as xCAT, Warewulf, or similar node deployment systems is beneficial.

Workload Scheduling Platforms

Experience supporting distributed compute workloads using schedulers such as:

Slurm
PBS / PBS Pro
Torque
Grid Engine

Candidates should understand queue configuration, job submission workflows, and scheduler troubleshooting.

Candidates should understand how workload schedulers interact with distributed compute workloads and containerized execution environments.

Linux Systems Administration

Strong Linux administration experience including:

command-line system administration
server and compute node configuration
system troubleshooting in distributed compute environments

Experience with RHEL-based environments is preferred.

Distributed and Containerized Workloads

Experience supporting distributed compute workloads utilizing parallel computing frameworks such as:

MPI
OpenMP
GPU compute frameworks

Candidates should understand how workload schedulers interact with distributed compute workloads and containerized execution environments within HPC clusters.

Familiarity with container technologies commonly used in HPC environments such as:

Docker
Podman
Singularity / Apptainer

Candidates should understand how containerized workloads interact with schedulers, GPU resources, and distributed compute environments.

Experience supporting containerized HPC workloads or integrating container platforms with cluster infrastructure is desirable.

HPC Networking

Familiarity with high-performance networking technologies including:

RDMA networking
InfiniBand
high-throughput cluster networking architectures

Candidates should be comfortable assisting with troubleshooting cluster communication or performance issues.

GPU Compute Environments

Experience supporting GPU-enabled compute environments and workloads utilizing CUDA frameworks is desirable.

Automation and Operational Tooling
Experience writing scripts or operational tooling using languages such as:

Bash
Python

Automation experience supporting system administration or cluster operations is beneficial.

Qualifications

Candidates must meet the following requirements:

Bachelor degree in science/technology; 10 additional YoE can be substituted for degree
8+ years of experience is required
Minimum 6 years of experience administering Linux systems in enterprise, research computing, or distributed compute environments
An Active Top Secret clearance is required; an active TS/SCI clearance must be obtained prior to beginning work.
100% onsite support in Charlottesville, VA
Experience supporting distributed compute environments or HPC cluster platforms
Experience working with workload schedulers such as Slurm, PBS, Torque, or similar systems
Experience administering Linux systems through command-line interfaces
Experience with scripting or automation tools (Bash, Python, or similar)
Ability to obtain required DoD 8140 (8570) IAT Level II certification
Candidates must have direct experience with HPC or distributed compute environments.

Candidates with the following experience are strongly preferred:

Administration of multi-node HPC cluster environments
Experience with parallel or distributed file systems such as Lustre, BeeGFS, or GPFS
Experience supporting GPU-enabled compute environments and CUDA workloads
Experience with configuration management tools such as Ansible or Puppet
Experience supporting research, laboratory, or mission computing environments
Experience supporting systems within DoD/DoW or IC environments

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10111346
Position Id: 2610670
Posted 30+ days ago

Company Info

About SAIC

SAIC® is a premier Fortune 500 mission integrator focused on advancing the power of technology and innovation to serve and protect our world. Our robust portfolio of offerings across the defense, space, civilian and intelligence markets include secure high-end solutions in mission IT, enterprise IT, engineering services and professional services. We integrate emerging technology, rapidly and securely, into mission critical operations that modernize and enable critical national imperatives.

We are approximately 24,000 strong; driven by mission, united by purpose, and inspired by opportunities. Headquartered in Reston, Virginia, SAIC has annual revenues of approximately $7.5 billion. For more information, visit saic.com. For ongoing news, please visit our newsroom.

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

HPC Support Engineer

Charlottesville, Virginia

•

Today

Job ID: 2610673 Location: Charlottesville, VA, US Date Posted: 2026-03-26 Category: Engineering and Sciences Subcategory: Systems Engineer Schedule: Full-Time Shift: Day Job Travel: No Minimum Clearance Required: Top_Secret Clearance Level Must Be Able to Obtain: TS/SCI Potential for Remote Work: ORA_ON_SITE Description SAIC is looking for a highly qualified HPC Support Engineer to support the Army's Golden Dome initiative. The engineer will support users executing workloads within Li

Full-time

Sr. Systems Engineer (Production Support)

Remote

•

Today

Job ID: 2612090 Location: Remote Work, TX, US Date Posted: 2026-05-01 Category: Engineering and Sciences Subcategory: Systems Engineer Schedule: Full-Time Shift: Day Job Travel: No Minimum Clearance Required: None Clearance Level Must Be Able to Obtain: Public Trust Potential for Remote Work: ORA_REMOTE Description SAIC is searching for a Systems Engineer who performs high-level, day-to-day operational support of complex application cloud systems to join our VA team. Develops solution

Full-time

USD 80,001.00 - 120,000.00 per year

Cloud Based Network Engineer

Remote

•

Today

Job ID: 2610495 Location: Remote Work, CA, US Date Posted: 2026-04-14 Category: Information Technology Subcategory: Network Engineer Schedule: Full-Time Shift: Day Job Travel: Yes - 25% of the time Minimum Clearance Required: Secret Clearance Level Must Be Able to Obtain: None Potential for Remote Work: ORA_REMOTE Description SAIC is seeking a TACNET ISEA Network Engineer (Fleet Support) to join our team in San Diego, CA. This position is Remote with expectations to be ON-SITE onboa

Full-time

USD 80,001.00 - 120,000.00 per year

Network Engineer Principal

Remote

•

Today

Job ID: 2610604 Location: Remote Work, VA, US Date Posted: 2026-03-24 Category: Information Technology Subcategory: Network Engineer Schedule: Full-Time Shift: Day Job Travel: Yes - 25% of the time Minimum Clearance Required: Top_Secret Clearance Level Must Be Able to Obtain: TS/SCI Potential for Remote Work: ORA_REMOTE Description SAIC is seeking a seasoned Network Engineers to join our Network Team. You will be responsible for the architecture, design, deployment, and optimization o

Full-time

USD 120,001.00 - 160,000.00 per year

Search all similar jobs