Overview
On Site
Full Time
Skills
Computer Networking
Engineering Design
Operating Systems
Performance Tuning
Computer Hardware
Debugging
Performance Analysis
Scripting
Bash
Python
Ansible
Puppet
Documentation
Technical Writing
Regulatory Compliance
Computer Science
Electrical Engineering
Systems Engineering
Management
SAN
Linux
Red Hat Enterprise Linux
Ubuntu
Oracle
Conflict Resolution
Problem Solving
Collaboration
DoD
GSEC
SSCP
Customer Engagement
CISA
CISSP
GCIH
Cisco Certifications
Security Clearance
Kubernetes
Computer Cluster Management
Artificial Intelligence
Machine Learning (ML)
Workflow
Orchestration
GPU
Virtualization
Cloud Computing
Scheduling
LSF
Job Details
Job Description
Base-2 Solutions is seeking a highly skilled Systems Engineer with deep expertise in operating systems, hardware, GPU, and high-speed networking. In this role, you will design, develop, and optimize GPU clusters that power enterprise AI for the mission customers.
Primary Responsibilities
Qualifications
Clearance
Preferred Qualifications
Base-2 Solutions is seeking a highly skilled Systems Engineer with deep expertise in operating systems, hardware, GPU, and high-speed networking. In this role, you will design, develop, and optimize GPU clusters that power enterprise AI for the mission customers.
Primary Responsibilities
- GPU Cluster Engineering: Design, configure, and maintain GPU Clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements.
- Operating System Integration: Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates.
- Performance Optimization: Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers.
- Tooling and Automation: Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt.
- Compliance & Documentation: Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards.
Qualifications
- Bachelor's or higher degree in Computer Science, Electrical Engineering, or a related field.
- Additional years of experience may be considered in lieu of a degree.
- 10 years of relevant systems engineering experience.
- Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s).
- Knowledge of enterprise server components (storage/network controllers, HBA, SSDs).
- Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky).
- Excellent problem-solving skills and the ability to collaborate within a team.
- Candidate must, at a minimum, meet DoD 8570.11 - IAT Level II certification requirements (currently Security CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP , CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).
Clearance
- TS/SCI clearance with Polygraph required or a TS/SCI and willingness to obtain a Polygraph prior to starting.
Preferred Qualifications
- Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow).
- Familiarity with GPU virtualization and cloud computing.
- Experience with PrometheGrafana for monitoring.
- Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.