Manages and supports high performance computing clusters, computational computing infrastructures and specialized research, science, engineering, and mathematics applications used in computational computing environments. Supports management, tuning, operating systems, parallel file systems, automating deployment of compute nodes, managing and tuning, specialized storage environments, login nodes, and HPC scheduling systems. • Administers and supports server operating systems and research application environments. Includes: compiling, installation and testing of patches, performance monitoring, and tuning. • Assists the IT Associate Director implementing computational technology projects. • Manages infrastructure such as, but not limited to, cabling, storage system, SAN switches, infiniband switches, network switches, server racks, and rack power infrastructure, as needed. • Provides support for end user service requests. • Interprets and implements policies and procedures as they apply to an assigned area. • May recommend changes on policy and procedures as necessary. • Performs essential duties in any emergencies such as hurricanes, public health emergencies, and/or any other university emergency closing. The employee is expected to be available to report to work as needed during university emergency closing with appropriate notification of a department administrator. • Troubleshoots and resolves complex systems or applications issues. • Works closely with the IT Associate Director in managing the HPC job scheduler. • Manages computational computing storage systems, including parallel file systems. • Uses of tools, scripts or programming, the automation of routine tasks are required to reduce manual workload, improve response time and to ensure system reliability. • Creates and maintains documentation for all servers, applications, tools, scripts, and procedures. • Researches new technologies and makes technology recommendations to lead HPC Systems Administrator or appropriate director. • Creates HPC systems and application environment designs, which should include performance, fault tolerance, fail-over and disaster recovery considerations. Makes recommendations to Lead HPC Systems Administrator or appropriate director. |