Senior AI Performance Engineer (CUDA / GPU / NVIDIA Stack)

Remote • Posted 4 hours ago • Updated 4 hours ago
Contract W2
Contract Independent
No Travel Required
Remote
$95 - $100/hr
Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

  • NVIDIA
  • CUDA
  • GPU
  • Performance

Summary

We are hiring a Senior AI Performance Engineer to work on large-scale GPU-accelerated AI systems powering real-time Vision AI platforms.

This role is focused on hands-on performance optimization across distributed multi-GPU environments — improving latency, throughput, and GPU utilization for production AI workloads.

⚠️ This is NOT a generic ML / DevOps role.
We are looking for candidates with deep GPU + NVIDIA ecosystem experience.


💡 Key Responsibilities

  • Analyze and optimize AI/ML workloads across multi-GPU, multi-node systems
  • Identify bottlenecks across compute, memory, and GPU communication layers
  • Optimize CUDA-based workloads (memory, compute efficiency, utilization)
  • Improve inference performance using Triton, TensorRT, or similar frameworks
  • Tune distributed systems using NCCL, Ray, or similar technologies
  • Monitor and improve system metrics: latency, throughput, GPU utilization
  • Build benchmarking and profiling workflows for performance analysis

Required Skills

  • Strong hands-on experience with CUDA and GPU performance optimization
  • Experience with NVIDIA ecosystem tools (Triton, TensorRT, NeMo, Nsight, etc.)
  • Deep understanding of GPU architecture and memory hierarchy
  • Experience with distributed AI systems (multi-GPU, NCCL, Ray, Kubernetes)
  • Experience profiling and tuning AI workloads
  • Hands-on experience with AI models (e.g., YOLO, GPT, LLaMA, Transformers)

Preferred

  • Experience working at NVIDIA or similar AI infrastructure companies
  • Experience with real-time inference or Vision AI systems
  • Experience with large-scale production AI deployments

🧪 Interview Process (Important)

  • Candidates will receive a technical scenario 1 day before interview
  • 90-minute deep-dive session (not theoretical)
  • Must demonstrate:
    • Bottleneck identification
    • GPU optimization approach
    • Real-world performance improvements

📩 Application Instructions

Apply ONLY if you have hands-on GPU/NVIDIA experience.

Please include:

  • Updated resume
  • Current location
  • Work authorization
  • Availability
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91134346
  • Position Id: 8955632
  • Posted 4 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Contract

70 - 80

Remote

4d ago

Easy Apply

Contract

$70 - $90

Remote

27d ago

Easy Apply

Full-time

Depends on Experience

Remote

Today

Easy Apply

Third Party, Contract

60 - 65

Search all similar jobs