Poweredge XE Server

Spartanburg, SC, US • Posted 2 days ago • Updated 2 days ago
Full Time
On-site
$70 - $80/hr
Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

  • PowerEdge
  • HPC

Summary

Job Title: Poweredge XE Server

  • PowerEdge Rack/Tower Experience, RHEL and Ubuntu OS, Experience working in a larger server environment.
  • PowerEdge XE server experience - Poweredge XE-series GPU servers and accelerated/HPC environment
  • Strong experience with HPC operations in production environments (scheduling, monitoring, troubleshooting, capacity management).
  • Hands on expertise with NVIDIA GPU infrastructure (preferably GB200/GB300 class racks, H100/B100 or similar generations) including firmware/driver stack, monitoring, and lifecycle management.
  • Familiarity with liquid cooled data center infrastructure: working safely around cold plate / rear door heat exchanger systems, understanding facility interfaces (CDU, manifolds, leak detection, etc.).
  • Solid understanding of Linux administration (RHEL/CentOS/Ubuntu) in HPC/AI clusters: OS provisioning, patching, performance tuning, and troubleshooting.
  • Strong Day 2 operations mindset: incident response, root-cause analysis, change management, and operational runbook creation.
  • Excellent customer facing communication skills and the ability to work on site, embedded with the customer s team and their operations staff.

Day 2 operations & stability

  • Own day to day operational support for the initial 48 fully loaded GB300 racks, scaling to grow toward 144 racks by year end.
  • Monitor system health, performance, and capacity; proactively identify and remediate issues impacting uptime or SLAs.
  • Perform incident triage, troubleshooting, and coordination with Dell and NVIDIA support as needed.
  • Support day to day data center activities
  • Proactively walk the data center through the day, watching/alerting customer of amber lights, hot doors etc.
  • Escort dispatched field engineers (when applicable)
  • LOIS parts management (will be trained once onsite)
  • Maintain documentation of the environment: serial numbers, elevations, runbooks, etc.
  • Coordinate with customer for any maintenance activities: power maintenance, cooling maintenance, containment wall maintenance, etc.
  • Performs BIOS lifecycle management, adding users, patching
  • Triage with Dell support for scheduling FSE break/fix activities
  • Works at the direction of the customer team to update versions as needed
  • Resource must be onsite 5 days there are no exceptions
  • Training important the candidate selected will need to complete XE Server training. This will need to be completed prior to the starting the residency
  • Infrastructure management
  • Support rack level and node level lifecycle tasks: firmware/driver updates, BIOS tuning, OS patching, and configuration consistency.
  • Assist with liquid cooling operations: safe work practices, coordination with facilities for maintenance/change activity, and monitoring of cooling performance and alarms.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10336460
  • Position Id: 8938925
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Spartanburg, South Carolina

Today

Full-time

Spartanburg, South Carolina

Today

Full-time

Spartanburg, South Carolina

Today

Easy Apply

Contract, Third Party

Spartanburg, South Carolina

Today

Full-time

Search all similar jobs