Skills
Job Description
RedLine Performance Solutions (RedLine) has been in the HPC solutions engineering services business for over 24 years and is consistently determined to keep the "bar of excellence" quite high for new hires. This enables RedLine to accomplish what other firms cannot and promotes a high level of staff retention. RedLine provides IT infrastructure management and technical support services to some of the world’s largest supercomputing sites.
The Linux Systems Administrator will work on a small team of HPC Systems Administrators responsible for the installation and operational support of a HPC cluster located in Phoenix Arizona. Operations run 24x7 and therefore there will be a rotational on-call requirement. The Linux HPC Systems Administrator will actively participate in the evolution and maintenance of the technical infrastructure in addition to supporting the on-site HPC environment.
Job Responsibilities:
- Work with systems staff to enhance configuration management infrastructure
- Evaluate performance impacts of planned operating system changes
- Update and expand existing systems monitoring capabilities
- Develop automation tools for cluster administration
- Participate in resource optimization and job scheduling software and policies
- Provide technical support to researchers using HPC resources, troubleshoot problems and develop appropriate computational strategies
- Consult and collaborate with scientist coworkers to determine best system configurations for applications.
Required Skills:
- Minimum of 5 years RedHat or CentOS Linux system administrator experience
- Demonstrated ability to configure, deploy and manage a major system area such as batch system, network, data storage, backup system, database system, or distributed computing
- Ability to work both independently and as part of the team; flexibility in dealing with assignments and in working on several projects simultaneously
- Ability to effectively communicate with people of diverse backgrounds and computer knowledge.
Preferred Skills/Experience:
- Experience in a Linux cluster environment
- Experience with batch systems such as SLURM or PBS
- Experience managing parallel and cluster file systems such as GPFS or Lustre
- Network management experience, including in an HPC context (e.g., InfiniBand, OmniPath)
- Provide leadership and technical expertise to improve HPC cluster performance and resiliency
- Prior experience with configuration management tools, such as Ansible and/or Puppet
- Experience integrating applications with cloud provider software stack
- Experience presenting and/or teaching
**COVID-19 Vaccination Requirement Statement
The COVID-19 vaccination requirement in Executive Order 14042 and FAR 52.223-99 is currently not effective. But please note that if those or other related requirements become effective, positions will require successful candidates to obtain and show proof of COVID-19 vaccination(s).