Senior AI Performance Engineer (CUDA / GPU / NVIDIA Stack)

Remote • Posted 30+ days ago • Updated 25 days ago
Contract Independent
Contract W2
12 Months
No Travel Required
Remote
$95 - $100/hr
Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

  • NVIDIA
  • CUDA
  • GPU
  • Performance
  • Artificial Intelligence
  • Benchmarking
  • Communication
  • DevOps
  • Kubernetes
  • Machine Learning (ML)
  • Optimization
  • Performance Analysis
  • Performance Tuning
  • Real-time
  • Workflow

Summary

We are hiring a Senior AI Performance Engineer to work on large-scale GPU-accelerated AI systems powering real-time Vision AI platforms.

This role is focused on hands-on performance optimization across distributed multi-GPU environments — improving latency, throughput, and GPU utilization for production AI workloads.

This is NOT a generic ML / DevOps role.
We are looking for candidates with deep GPU + NVIDIA ecosystem experience.


Key Responsibilities

  • Analyze and optimize AI/ML workloads across multi-GPU, multi-node systems
  • Identify bottlenecks across compute, memory, and GPU communication layers
  • Optimize CUDA-based workloads (memory, compute efficiency, utilization)
  • Improve inference performance using Triton, TensorRT, or similar frameworks
  • Tune distributed systems using NCCL, Ray, or similar technologies
  • Monitor and improve system metrics: latency, throughput, GPU utilization
  • Build benchmarking and profiling workflows for performance analysis

Required Skills

  • Strong hands-on experience with CUDA and GPU performance optimization
  • Experience with NVIDIA ecosystem tools (Triton, TensorRT, NeMo, Nsight, etc.)
  • Deep understanding of GPU architecture and memory hierarchy
  • Experience with distributed AI systems (multi-GPU, NCCL, Ray, Kubernetes)
  • Experience profiling and tuning AI workloads
  • Hands-on experience with AI models (e.g., YOLO, GPT, LLaMA, Transformers)

Preferred

  • Experience working at NVIDIA or similar AI infrastructure companies
  • Experience with real-time inference or Vision AI systems
  • Experience with large-scale production AI deployments

Interview Process (Important)

  • Candidates will receive a technical scenario 1 day before interview
  • 90-minute deep-dive session (not theoretical)
  • Must demonstrate:
    • Bottleneck identification
    • GPU optimization approach
    • Real-world performance improvements

Application Instructions

Apply ONLY if you have hands-on GPU/NVIDIA experience.

Please include:

  • Updated resume
  • Current location
  • Work authorization
  • Availability
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91134346
  • Position Id: 8955632
  • Posted 30+ days ago
Contact the job poster
KS

Kajal Singh

Recruiter @ Brillfy Technology
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Contract, Third Party

Depends on Experience

Remote

Today

Full-time

USD 466,000.00 - 750,000.00 per year

Remote or California

Today

Full-time

USD 175,000.00 - 220,000.00 per year

Remote

4d ago

Easy Apply

Full-time

130,000 - 140,000

Search all similar jobs