Overview
On Site
Contract - W2
Contract - 24 day((s))
Skills
Configuration management
Nvidia
Linux Administration
GPGPU/GPU
Hardware Troubleshooting
Infrastructure & Operations
Infrastructure Automation and Orchestration
GPU platforms
Job Details
Role: Senior GPU Platform Engineer - AI Infrastructure Operations
Location: Redmond, WA (4 days a week onsite is must)
Job Type: W2 Contract
Contract length: 1 year
On-site position requiring regular hands-on access to hardware in lab or data center environments.
MUST HAVE SKILLS:
- Configuration Management
- GPGPU/GPU
- Hardware Troubleshooting
- Infrastructure & Operations
- Infrastructure Automation and Orchestration
- Linux Administration
Description:
- Join our team to operate and support cutting-edge GPU infrastructure powering AI and high-performance computing workloads for a leading global hyperscale cloud provider.
- In this hands-on role, you'll manage the full lifecycle of NVIDIA GPU platforms from bring-up to break/fix while ensuring optimal performance for advanced AI applications.
Responsibilities:
- Operate and maintain production GPU and bare-metal compute platforms with hands-on hardware management
- Perform physical infrastructure tasks including rack/stack, cabling, power validation, and system bring-up
- Diagnose hardware faults, replace failed components, and coordinate vendor support for complex issues
- Install and configure Linux operating systems with GPU-specific drivers and software stacks
- Execute platform validation using diagnostic tools to ensure GPU health, stability, and performance
- Provision bare-metal systems through automated workflows while troubleshooting configuration issues
- Apply firmware, BIOS, and platform configuration changes following standardized change processes
Requirements:
- 5+ years professional experience supporting production server infrastructure in data center environments
- Strong Linux administration skills with ability to independently troubleshoot system-level issues
- Hands-on experience with physical server hardware including diagnostics and component replacement
- Familiarity with GPU platforms, preferably NVIDIA, and associated drivers and software stacks
- Experience working in structured, change-controlled production environments
- Knowledge of infrastructure monitoring tools and alert response procedures
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.