Linux & HPC Systems Engineer

Overview

On Site
$85000
Full Time

Job Details

Vaco is hiring a Linux & HPC Systems Engineer for a direct hire opportunity in Cincinnati, OH.

This role is open only to candidates who are locally based and authorized to work in the U.S. No visa sponsorship or transfer is available.

We are seeking a Linux & HPC Systems Engineer to join our Cincinnati-based client’s team. This role focuses on building and managing advanced research computing systems that empower scientific discovery and innovation. The Linux & HPC Systems Engineer will support the design, integration, and operation of computing infrastructure that enables high-performance computing, large-scale data analysis, and collaboration across diverse domains.

As our client continues to modernize internal platforms—including a recently launched AI/GPU workload management platform—this position offers an exciting opportunity for early-career systems engineers to gain hands-on experience in a dynamic environment. You will join a collaborative team supporting cutting-edge initiatives, contributing to both the implementation and ongoing evolution of infrastructure and services that shape the future of technology.

This includes support for an expanding high-performance computing (HPC) cluster, a large language model (LLM) platform, and a range of resource-intensive applications, from genomic analysis to deep learning–based image processing. The team also plays a key role in designing and maintaining scalable architecture and supports interactive tools that allow users to engage directly with HPC resources.

The person hired for this position will provide support for HPC clusters including (but not limited to) troubleshooting jobs, resource management, and compilation of software modules. Excellent written and oral communication skills are necessary as you will interface directly with users on projects via email and incident management systems. While not a core responsibility, this role will also contribute to broader initiatives related to advanced computing, AI integration, and platform evolution.


Key Responsibilities:

    System Analysis & Design

    • Analyze, design, implement, and maintain moderately complex systems.
    • Participate in system testing and documentation.
    • Design and develop technical solutions to address operational challenges.
    • Prepare comprehensive user and technical documentation.
    • Follow development lifecycle processes to ensure success.

    Technical Support

    • Provide technical support and problem resolution for production issues.
    • Troubleshoot and interpret error messages.
    • Utilize change control processes to implement solutions.
    • Serve as a resource for other departments.
    • Monitor system performance and participate in on-call rotations.

    End-User Support

    • Ensure exceptional end-user support and monitor SLAs.
    • Collaborate with other teams for customer-centric incident management.
    • Promote adherence to change management policies.
    • Exemplify excellent customer service behavior.

    Project Coordination

    • Collaborate in the design, development, and implementation of new and enhanced application requests.
    • Identify and allocate appropriate resources for small to mid-sized projects.
    • Act as a liaison between internal and external teams.
    • Participate in creating and managing detailed project plans.
    • Assess project scope and complexity and contribute to planning and resource allocation.

    Job Qualifications:

    • Bachelor’s degree in a related field, or equivalent combination of education and experience.
    • 2+ years of professional experience in a related discipline, with a willingness to learn and grow.

    Preferred Qualifications

    • 2+ years of experience in a related discipline.
    • HPC cluster experience: creating accounts, understanding batch job systems, troubleshooting jobs, and building software modules.
    • Experience with job schedulers like LSF or Slurm.
    • Experience imaging and building RHEL-based servers.
    • Experience with Puppet/Satellite/Ansible or related configuration management tools.
    • Experience configuring LDAP, DNS, Apache, networking, storage, services, and logging on RHEL-based systems.
    • Basic server and data center hardware knowledge.
    Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

    About Vaco by Highspring