Overview
On Site
Full Time
Skills
Information Technology
Evaluation
Training
High Performance Computing
Red Hat Enterprise Linux
Servers
CPU
GPU
Ethernet
Optical Fiber
InfiniBand
Leadership
Network Monitoring
System Administration
Regulatory Compliance
Network
Storage
Needs Analysis
Scheduling
Computer Hardware
System Requirements
GitLab
Lua
Tcl
Technical Support
Presentations
Documentation
DoD
Security Clearance
Computer Engineering
Computer Science
Systems Design
Parallel Computing
Linux
Unix
Computer Networking
Backup Administration
Data Archiving
System Security
Middleware
HPC
Distributed Computing
File Systems
Virtualization
VMware
Python
Bash
Perl
Scripting
Ansible
Docker
Kubernetes
Programming Languages
C
C++
Fortran
Job Details
Overview
AMERICAN SYSTEMS is an employee-owned federal government contractor supporting national priority programs through our strategic solutions in the areas of Information Technology, Test & Evaluation, Program Mission Support, Engineering & Analysis, and Training.
Responsibilities
THIS POSITION COMES WITH A 10K SIGNING BONUS! As an HPC Engineer with AMERICAN SYSTEMS you will have an opportunity to do the followingl:
#recruitingsurge
Qualifications
EEO Statement
EEO Race/Sex/Disability StatVeteran Status
AMERICAN SYSTEMS is an employee-owned federal government contractor supporting national priority programs through our strategic solutions in the areas of Information Technology, Test & Evaluation, Program Mission Support, Engineering & Analysis, and Training.
Responsibilities
THIS POSITION COMES WITH A 10K SIGNING BONUS! As an HPC Engineer with AMERICAN SYSTEMS you will have an opportunity to do the followingl:
- Apply comprehensive knowledge of High Performance Computing (HPC) systems, comprised of high-speed, multi-petabyte Lustre file systems, Red Hat Enterprise Linux (RHEL) servers, CPU/GPU compute nodes, and high performance storage arrays, using Ethernet, fiber, Omni-Path, and InfiniBand interconnections.
- Provide functional and technical expertise in support of user-developed software and technical advice and leadership to other technical staff
- Utilize a wide variety of skills in system and network monitoring; large-scale systems administration; scripting and automation; security compliance; network distributed services; storage and backups; and hardware and software problem diagnosis and resolution.
- Diagnose and troubleshoot technical problems, often of a complex nature, associated with computer hardware and software interrelationships and dependencies.
- Conduct needs analysis, planning, and scheduling the installation of a wide variety of new or modified hardware/software.
- Develop functional and technical IT system requirements and specifications. Configure and optimize system tools and applications, to include job schedulers (Slurm and PBSPro) and system resources (GitLab, LUA/TCL modules, and system support applications).
- Create and brief technical presentations to technical and non-technical stakeholders. Maintain detailed documentation of system configurations, procedures, and troubleshooting guides. Develop user facing documentation.
#recruitingsurge
Qualifications
- DoD Top Secret (TS) clearance with SCI eligibility
- Bachelor's in Computer Engineering, Computer Science, or related field and ten or more years of job related experience.
- Thorough knowledge of complex concepts, practices, and troubleshooting associated with HPC cluster systems design, installation, and maintenance.
- Advanced knowledge in distributed computing theory, parallel processing, applications, and associated infrastructure is required.
- Extensive experience with Linux/Unix systems including installation, configuration, networking, backups, updates and patching, data archiving, and system security.
- Functional knowledge of HPC middleware, and platform managers such as Bright Cluster Manager; employing job schedulers such as PBS, Slurm, Torque, etc.; and, optimizing job queues.
- Experience with HPC or large-scale distributed computing environments and technologies such as high-speed low-latency interconnects (e.g. InifiniBand), parallel file systems (e.g. Lustre), and virtualization environments and tools (e.g. VMWare).
- Experience developing Python/bash/Perl scripts and employing automation frameworks such as Ansible.
- General knowledge employing Docker containers and Kubernetes ecosystems.
- Working knowledge in one or more programming languages (e.g. C/C++, Fortran, etc.)
EEO Statement
EEO Race/Sex/Disability StatVeteran Status
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.