Philadelphia, Pennsylvania
•
Today
Overview We are looking for a GPU Performance Engineer to build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model, and where custom kernels, memory layouts, and execution strategies can deliver meaningful gains. You will work closely with quantitative researchers and engineers to understand model structure, identify computational bottlenecks, and turn mathem
Full-time












