ML Platform Engineer - GPU Infrastructure

Warren, MI, US • Posted 8 days ago • Updated 11 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • DevOps
  • Systems Engineering
  • Computer Science
  • Computer Engineering
  • Training
  • Management
  • Storage
  • Scalability
  • Collaboration
  • Linux
  • Kubernetes
  • Docker
  • Continuous Integration
  • Continuous Delivery
  • Scripting
  • Python
  • Bash
  • Artificial Intelligence
  • Machine Learning (ML)
  • GPU
  • Performance Tuning
  • SIM
  • Cloud Computing
  • Amazon Web Services
  • Microsoft Azure
  • Google Cloud
  • Google Cloud Platform
  • Grafana

Summary

Job Title: ML Platform Engineer - GPU Infrastructure

Job Summary
Support team by designing, implementing, and maintaining the automation and ML workload enablement layer of the GPU cluster platform. This role focuses on optimizing GPU compute environments for AI/ML training and Isaac Sim simulation workloads, integrating GPU jobs into CI/CD pipelines, standardizing runtime environments, and supporting reliable storage and artifact management.

Required Experience
3+ years of experience in ML Platform Engineering, DevOps, Infrastructure Engineering, or related field
Bachelor's or Master's degree in Systems Engineering, Computer Science, Computer Engineering, or related discipline

Responsibilities
Support GPU cluster platforms for AI/ML and simulation workloads
Optimize GPU compute environments for ML training and Isaac Sim execution
Integrate GPU workload execution into CI/CD pipelines
Standardize runtime environments using containers and automation tools
Manage storage, artifacts, and workload outputs
Troubleshoot and improve platform reliability, scalability, and performance
Collaborate with ML, infrastructure, and engineering teams

Required Skills
Experience with Linux, Kubernetes, Docker, and GPU infrastructure
Knowledge of CI/CD tools and automation scripting (Python/Bash)
Experience supporting AI/ML workloads and distributed systems
Familiarity with NVIDIA GPU technologies and containerized environments
Strong troubleshooting and performance optimization skills

Preferred Skills
Experience with Isaac Sim or simulation workloads
Exposure to cloud platforms (AWS, Azure, or Google Cloud Platform)
Knowledge of monitoring and observability tools such as Grafana or Prometheus
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10382565
  • Position Id: d423996a318b9b1b15763cd201fadb4c
  • Posted 8 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Warren, Michigan

Today

Full-time

Remote

Today

Full-time

USD 160,000.00 - 210,000.00 per year

No location provided

Today

Full-time

USD 180,000.00 - 300,000.00 per year

Remote

Today

Full-time

USD 148,000.00 - 216,000.00 per year

Search all similar jobs