Overview
Remote
Depends on Experience
Accepts corp to corp applications
Contract - W2
Skills
Amazon EC2
Amazon Web Services
Bash
DevOps
HPC
High Performance Computing
Linux
Python
Red Hat Enterprise Linux
System Administration
Job Details
Job Title: HPC DevOps Engineer /System Administrator IV
Location: 100% Remote.
Duration: Long Term (Extendable)
Job Description:
Support and enhance our High-Performance Computing (HPC) environment in collaboration with IT and R&D stakeholders. This role involves managing and optimizing supercomputing infrastructure, ensuring reliable system administration, and driving automation and cloud integration efforts. Key responsibilities include:
- Administering and maintaining HPC systems, including accelerated nodes and node imaging, with a strong focus on RHEL 8 and Rocky Linux 8 environments.
- Supporting HPC job submission workflows and workload managers, preferably PBS, to ensure efficient resource utilization.
- Installing and configuring scientific and engineering software applications, and managing environment modules.
- Designing and implementing networking and enterprise storage solutions to support high-throughput computing workloads.
- Utilizing configuration and automation tools such as SaltStack, Packer, and GitHub Actions to streamline system provisioning and CI/CD workflows.
- Managing cloud-based HPC resources using AWS services including EC2, FSx, EFS, CloudFormation, Route 53, and DevOps tools like CodeBuild and CodeDeploy.
- Developing and maintaining scripts in Bash and Python to automate tasks and improve system reliability.
- Applying best practices in version control and code deployment using GitHub.
Skills
High-Performance Computing (HPC) & System Administration
- Strong background in supercomputing and HPC system architecture
- Experience with accelerated nodes and node imaging
- Proficient in system administration for RHEL 8 and Rocky Linux 8
- Skilled in HPC job submission workflows and workload managers (preferably PBS)
- Hands-on experience with software application installation and environment configuration
- In-depth knowledge of networking and enterprise storage solutions
Configuration & Automation Tools
- SaltStack for configuration management
- Packer for image creation and automation
- GitHub Actions for CI/CD workflows
Cloud Computing (AWS)
- Proficient with core AWS services: EC2, CloudFormation, FSx, EFS, Route 53
- Experience with AWS DevOps tools: CodeBuild and CodeDeploy
- Solid understanding of AWS networking and infrastructure best practices
Scripting & Version Control
- Strong scripting skills with Bash and Python
- Experience managing GitHub repositories with best practices for code deployment and version control
Education
Bachelor s degree from an accredited university or equivalent professional experience in a related field.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.