GPU Platform Infrastructure Engineer

Warren, MI, US • Posted 8 days ago • Updated 4 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

  • Research and Development
  • DevOps
  • Systems Engineering
  • Computer Science
  • Computer Engineering
  • Management
  • Provisioning
  • Lifecycle Management
  • Resource Allocation
  • Dashboard
  • Onboarding
  • SIM
  • Scalability
  • Workflow
  • Collaboration
  • Linux
  • Kubernetes
  • Docker
  • Scheduling
  • Resource Management
  • Artificial Intelligence
  • Machine Learning (ML)
  • Reporting
  • Scripting
  • Python
  • Bash
  • GPU
  • Continuous Integration
  • Continuous Delivery
  • Cloud Computing
  • Amazon Web Services
  • Microsoft Azure
  • Google Cloud Platform
  • Google Cloud
  • Graphical User Interface
  • Documentation

Summary

Job Title: GPU Platform Infrastructure Engineer

Job Summary

Support the GM ARC RTD team by building and maintaining the foundational GPU cluster platform infrastructure supporting shared AI/ML, simulation, and validation workloads. This role focuses on GPU access governance, resource allocation, scheduling policies, observability, and operational support for multi-tenant GPU environments including RTX 6000, A100, B200, and future systems.

Required Experience
3+ years of experience in Platform Engineering, Infrastructure Engineering, DevOps, or related field
Bachelor's or Master's degree in Systems Engineering, Computer Science, Computer Engineering, or related discipline

Responsibilities
Manage GPU cluster access provisioning, onboarding, permissions, and lifecycle management
Design and maintain GPU resource allocation policies, quotas, namespace isolation, and scheduling configurations
Develop GPU utilization dashboards, reporting, monitoring, and capacity tracking solutions
Create reusable job submission templates and onboarding documentation for ML, Isaac Sim simulation, and validation workloads
Support platform governance, operational continuity, infrastructure scalability, and CI/CD integration
Design and develop GUI-based tools for streamlined Docker development workflows
Collaborate with infrastructure, AI/ML, and engineering teams to support shared GPU operations

Required Skills
Experience with Linux, Kubernetes, Docker, and GPU infrastructure environments
Knowledge of workload scheduling, resource management, and multi-tenant platform operations
Experience supporting AI/ML, simulation, or GPU-intensive engineering workloads
Experience with monitoring, observability, and reporting tools
Strong scripting and automation skills using Python, Bash, or similar languages
Familiarity with NVIDIA GPU platforms, containerized compute environments, and infrastructure automation tools
Experience with CI/CD pipelines and cloud platforms such as AWS, Azure, or Google Cloud Platform is a plus
Experience with GUI development frameworks is a plus
Strong troubleshooting, documentation, and operational support skills
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10382565
  • Position Id: 3f0511a6a39be5c5a3612b388403c628
  • Posted 8 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Warren, Michigan

Today

Full-time

Remote or Illinois

Today

Full-time

USD 116,300.00 - 178,400.00 per year

Remote

2d ago

Easy Apply

Full-time

130,000 - 140,000

California

Today

Full-time

USD 125,200.00 - 181,600.00 per year

Search all similar jobs