Senior AI/ML performance engineer

• Posted 2 days ago • Updated 9 minutes ago
Contract Corp To Corp
Fitment

Dice Job Match Score™

⏳ Almost there, hang tight...

Job Details

Skills

  • AI/ML
  • Tensorflow
  • Nvidia
  • CUDA

Summary

Key Responsibilities

  • Profile and optimize large-scale AI training and inference workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters.
  • Build tools, frameworks, to detect and identify bottlenecks in compute, memory, interconnects, and communication libraries and deliver optimizations to maximize scaling efficiency.
  • Develop, maintain and recommend benchmarks for AI training and inference workloads.
  • Partner with framework teams (PyTorch, TensorFlow) to upstream performance improvements and enable better scaling APIs.
  • Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure
  • Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them

Required Qualifications

  • 5+ years in AI/ML performance engineering, HPC, or large-scale inference systems
  • BS or similar background in Computer Science or related area (or equivalent experience)
  • Strong understanding and hands-on modern ML techniques and tools
  • Strong hands-on CUDA programming and optimization experience
  • Deep understanding of GPU architecture and memory hierarchy
  • Experience optimizing PyTorch and/or TensorFlow inference
  • Hands-on experience with NVIDIA Triton, Apache Ray, and Kubernetes GPU scheduling
  • Experience with RAPIDS and GPU-accelerated data pipelines
  • Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools
  • Strong track record of optimizing large-scale AI systems

Nice to Have

  • Neural network architecture optimization experience
  • Deep TensorRT optimization expertise
  • Video analytics or real-time inference systems experience
  • Experience operating large-scale GPU clusters Experience with WebAssembly (WASM) for performance-critical frontend computation.
  • Advanced Linux OS, container (e.g. Docker) and GitHub skills

Thanks and Regards,

Shiney K

Sr. US IT Recruiter

Prointegrate Inc.

Phone:

Email ID:

New York | London | India

This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient or have received this transmission in error, please contact the sender by reply by email and destroy all copies of the original message. Any unauthorized review, use, copy, dissemination, or disclosure of this email is strictly prohibited.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91097474
  • Position Id: 2025-134/13954
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

23d ago

Easy Apply

Full-time

Depends on Experience

Spring, Texas

27d ago

Easy Apply

Contract, Third Party

Depends on Experience

Hybrid in Dallas, Texas

Today

Easy Apply

Third Party, Contract

Depends on Experience

Dallas, Texas

11d ago

Easy Apply

Full-time, Contract, Third Party

Depends on Experience

Search all similar jobs