Overview
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - Months contract
50% Travel
Skills
Machine Learning Performance Engineer
PTX
SASS
CUDA
Job Details
Title: Machine Learning Performance Engineer
Location: San Ramon, CA preferred with 50% travel
Duration: 6+ Months contract
Skills Required: ML+CUDA + Python
Description
Must be willing to travel to customer sites. Job Responsibilities include CUDA installation/configuration/tuning issues and slowing down the adoption of the technology. These experts will help us fix these issues.
Requirements:
- An understanding of modern ML techniques and toolsets
- The experience and systems knowledge required to debug a training run's performance end to end
- Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy
- Debugging and optimization experience using tools like CUDA GDB, NSight Systems, NSight Compute
- Library knowledge of Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS
- Intuition about the latency and throughput characteristics of CUDA graph launch, tensor core arithmetic, warp-level synchronization, and asynchronous memory loads
- Background in Infiniband, RoCE, GPUDirect, PXN, rail optimization, and NVLink, and how to use these networking technologies to link up GPU clusters
- An understanding of the collective algorithms supporting distributed GPU training in NCCL or MPI
- An inventive approach and the willingness to ask hard questions about whether we're taking the right approaches and using the right tools.
Thanks & Regards
Vijaya Lakshmi
Lead Recruiter
Consulting | Staffing | Mobile & Web Solutions
Phone:
21800 Haggerty Road, Suite 204 Northville, Michigan 48167
MBE Certified | E-Verify
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.