Data Center GPU Commissioning Engineer

San Jose, CA, US • Posted 4 hours ago • Updated 4 hours ago
Contract Corp To Corp
Contract Independent
Contract W2
On-site
Depends on Experience
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Data Center GPU Commissioning Engineer

Summary

Role: Data Center GPU Commissioning Engineer

Location: San Jose CA (100% Onsite)

12months

Job Description

The Data Center GPU Commissioning Engineer is responsible for commissioning, validating, and stabilizing GPUbased infrastructure in data center environments. This role ensures GPU servers, interconnects, drivers, firmware, and platform software are correctly installed, configured, tested, and productionready to support AI, ML, and HPC workloads.

The engineer works closely with Deployment, Network, Platform, and Operations teams to deliver reliable, highperformance GPU clusters and ensure smooth handover to run operations.

Key Responsibilities

  • Perform endtoend commissioning of GPU servers and clusters in data centers.
  • Validate hardware installation, power, cooling, and cabling readiness for GPU systems.
  • Install and configure GPU drivers, firmware, BIOS settings, and system software.
  • Verify GPU health, performance, and stability using standard validation and burnin tests.
  • Validate highspeed interconnects and networking used for GPU workloads.
  • Execute clusterlevel testing for AI / HPC readiness and baseline performance.
  • Identify, troubleshoot, and resolve hardware, driver, or configuration issues during commissioning.
  • Work with OEMs and vendors for issue resolution and firmware recommendations.
  • Ensure systems comply with security, hardening, and operational standards.
  • Document commissioning procedures, results, and asbuilt configurations.
  • Support handover to operations teams and assist during earlylife stabilization.

Required Skills & Experience
Technical Skills

  • Handson experience with GPUbased servers in data center environments
  • Strong understanding of:
    • Linux system administration
    • GPU drivers, firmware, and system tuning
    • Server BIOS, firmware upgrades, and hardware diagnostics
  • Familiarity with data center networking concepts and highperformance interconnects
  • Exposure to AI / ML / HPC environments is strongly preferred

Operational Skills

  • Strong troubleshooting and root cause analysis skills
  • Experience working in structured deployment and commissioning processes
  • Ability to follow and improve runbooks and SOPs

Certifications (Preferred)

  • OEM server certifications (HPE / Dell / Lenovo or equivalent)
  • Linux administration certifications
  • GPU / AI platform certifications (nice to have)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10110007
  • Position Id: GPUCOMM
  • Posted 4 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Milpitas, California

Today

Easy Apply

Contract

$40 - $48

Milpitas, California

2d ago

Easy Apply

Contract

Depends on Experience

San Jose, California

2d ago

Easy Apply

Contract

$100+

Santa Clara, California

Today

Easy Apply

Contract

$40 - $50

Search all similar jobs