Apply Now

GPU Platform Infrastructure Engineer

Warren, MI, US • Posted 8 days ago • Updated 4 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

Research and Development
DevOps
Systems Engineering
Computer Science
Computer Engineering
Management
Provisioning
Lifecycle Management
Resource Allocation
Dashboard
Onboarding
SIM
Scalability
Workflow
Collaboration
Linux
Kubernetes
Docker
Scheduling
Resource Management
Artificial Intelligence
Machine Learning (ML)
Reporting
Scripting
Python
Bash
GPU
Continuous Integration
Continuous Delivery
Cloud Computing
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Google Cloud
Graphical User Interface
Documentation

Summary

Job Title: GPU Platform Infrastructure Engineer

Job Summary

Support the GM ARC RTD team by building and maintaining the foundational GPU cluster platform infrastructure supporting shared AI/ML, simulation, and validation workloads. This role focuses on GPU access governance, resource allocation, scheduling policies, observability, and operational support for multi-tenant GPU environments including RTX 6000, A100, B200, and future systems.

Required Experience
3+ years of experience in Platform Engineering, Infrastructure Engineering, DevOps, or related field
Bachelor's or Master's degree in Systems Engineering, Computer Science, Computer Engineering, or related discipline

Responsibilities
Manage GPU cluster access provisioning, onboarding, permissions, and lifecycle management
Design and maintain GPU resource allocation policies, quotas, namespace isolation, and scheduling configurations
Develop GPU utilization dashboards, reporting, monitoring, and capacity tracking solutions
Create reusable job submission templates and onboarding documentation for ML, Isaac Sim simulation, and validation workloads
Support platform governance, operational continuity, infrastructure scalability, and CI/CD integration
Design and develop GUI-based tools for streamlined Docker development workflows
Collaborate with infrastructure, AI/ML, and engineering teams to support shared GPU operations

Required Skills
Experience with Linux, Kubernetes, Docker, and GPU infrastructure environments
Knowledge of workload scheduling, resource management, and multi-tenant platform operations
Experience supporting AI/ML, simulation, or GPU-intensive engineering workloads
Experience with monitoring, observability, and reporting tools
Strong scripting and automation skills using Python, Bash, or similar languages
Familiarity with NVIDIA GPU platforms, containerized compute environments, and infrastructure automation tools
Experience with CI/CD pipelines and cloud platforms such as AWS, Azure, or Google Cloud Platform is a plus
Experience with GUI development frameworks is a plus
Strong troubleshooting, documentation, and operational support skills

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10382565
Position Id: 3f0511a6a39be5c5a3612b388403c628
Posted 8 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Warren, Michigan

•

Today

Job Title: ML Platform Engineer - GPU Infrastructure Job Summary Support team by designing, implementing, and maintaining the automation and ML workload enablement layer of the GPU cluster platform. This role focuses on optimizing GPU compute environments for AI/ML training and Isaac Sim simulation workloads, integrating GPU jobs into CI/CD pipelines, standardizing runtime environments, and supporting reliable storage and artifact management. Required Experience 3+ years of experience in ML Pl

Full-time

Platform Engineer

Remote or Illinois

•

Today

POSITION OVERVIEW We are seeking a highly skilled Platform Engineer to join our Engineering team. In this role, you will design, build, and maintain the internal developer platform that enables our engineering organization to ship software reliably and at scale. You will own the full platform lifecycle - from infrastructure provisioning and GitOps delivery pipelines to observability and developer experience - working across our multi-cloud environments. KEY RESPONSIBILITIES Infrastructure & C

Full-time

USD 116,300.00 - 178,400.00 per year

GPU Software Engineer

Remote

•

2d ago

Position: GPUSoftwareEngineer Remote Fill time Role Summary We are seeking expert-levelGPUSoftwareEngineers to support a high-visibility platform initiative within the Maya program, focused on buildingsoftwaretooling on top of a custom compiler and SDK. The role involves developing, optimizing, and portingGPUkernels and AI workloads to a specialized hardware platform. This is a critical and time-sensitive engagement with immediate onboarding expectations and long-term roadmap alignment (~18 mont

Easy Apply

Full-time

130,000 - 140,000

Platform Engineer

California

•

Today

QuantumScape is on a mission to transform energy storage with solid-state lithium-metal battery technology. The company's next-generation batteries are designed to enable greater energy density, faster charging and enhanced safety to support the transition away from legacy energy sources toward a lower carbon future. About the team: We're a small Platform Engineering team responsible for the core foundation that powers product development. We support our in-house developers with reliable toolin

Full-time

USD 125,200.00 - 181,600.00 per year

Search all similar jobs

GPU Platform Infrastructure Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs