JOB DESCRIPTION
Senior Performance Engineer
Vision AI Platform Public Sector
Location | Global (remote) US business hours overlap required |
Reporting to | Assurance Lead/ Assurance Director |
Team | Globally distributed engineering team |
Industry | Artificial Intelligence Edge Computing Public Sector |
Employment | Contract |
About the Role
Our Vision AI platform gives US public sector clients federal agencies, smart-city operators, defense contractors, and critical infrastructure teams a real-time window into their physical world. Think live sensor dashboards, geospatial overlays, AI inference result streams, and operational command interfaces used by people who cannot afford a slow or confusing UI.
We are seeking a specialized AI Performance Engineer (Consultant) to drive GPU acceleration, CUDA optimization, and distributed AI workload performance for VisionAI.
This is a hands-on performance engineering role focused on optimizing deep learning inference, GPU/CPU utilization, distributed orchestration, and capacity planning across city-scale AI deployments.
The consultant will work closely with AI, DevOps, and Infrastructure teams to improve latency, throughput, and overall system efficiency for production AI workloads.
Key Responsibilities
- Profile and optimize large-scale AI training and inference workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters.
- Build tools, frameworks, to detect and identify bottlenecks in compute, memory, interconnects, and communication libraries and deliver optimizations to maximize scaling efficiency.
- Develop, maintain and recommend benchmarks for AI training and inference workloads.
- Partner with framework teams (PyTorch, TensorFlow) to upstream performance improvements and enable better scaling APIs.
- Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure
- Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them
Required Qualifications
- 5+ years in AI/ML performance engineering, HPC, or large-scale inference systems
- BS or similar background in Computer Science or related area (or equivalent experience)
- Strong understanding and hands-on modern ML techniques and tools
- Strong hands-on CUDA programming and optimization experience
- Deep understanding of GPU architecture and memory hierarchy
- Experience optimizing PyTorch and/or TensorFlow inference
- Hands-on experience with NVIDIA Triton, Apache Ray, and Kubernetes GPU scheduling
- Experience with RAPIDS and GPU-accelerated data pipelines
- Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools
- Strong track record of optimizing large-scale AI systems
Nice to Have
- Neural network architecture optimization experience
- Deep TensorRT optimization expertise
- Video analytics or real-time inference systems experience
- Experience operating large-scale GPU clusters Experience with WebAssembly (WASM) for performance-critical frontend computation.
- Advanced Linux OS, container (e.g. Docker) and GitHub skills