Senior Performance Engineer
Vision AI Platform · Public Sector
Location Global (remote) — US business hours overlap required
Reporting to Assurance Lead/ Assurance Director
Team Globally distributed engineering team
Industry Artificial Intelligence · Edge Computing · Public Sector
Employment Contract
About the Role
Our Vision AI platform gives US public sector clients — federal agencies, smart-city operators, defense contractors, and critical infrastructure teams — a real-time window into their physical world. Think live sensor dashboards, geospatial overlays, AI inference result streams, and operational command interfaces used by people who cannot afford a slow or confusing UI.
We are seeking a specialized AI Performance Engineer (Consultant) to drive GPU acceleration, CUDA optimization, and distributed AI workload performance for VisionAI.
This is a hands-on performance engineering role focused on optimizing deep learning inference, GPU/CPU utilization, distributed orchestration, and capacity planning across city-scale AI deployments.
The consultant will work closely with AI, DevOps, and Infrastructure teams to improve latency, throughput, and overall system efficiency for production AI workloads.
Key Responsibilities
· Profile and optimize large-scale AI training and inference workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters.
· Build tools, frameworks, to detect and identify bottlenecks in compute, memory, interconnects, and communication libraries and deliver optimizations to maximize scaling efficiency.
· Develop, maintain and recommend benchmarks for AI training and inference workloads.
· Partner with framework teams (PyTorch, TensorFlow) to upstream performance improvements and enable better scaling APIs.
· Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure
· Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them
Required Qualifications
· 5+ years in AI/ML performance engineering, HPC, or large-scale inference systems
· BS or similar background in Computer Science or related area (or equivalent experience)
· Strong understanding and hands-on modern ML techniques and tools
· Strong hands-on CUDA programming and optimization experience
· Deep understanding of GPU architecture and memory hierarchy
· Experience optimizing PyTorch and/or TensorFlow inference
· Hands-on experience with NVIDIA Triton, Apache Ray, and Kubernetes GPU scheduling
· Experience with RAPIDS and GPU-accelerated data pipelines
· Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools
· Strong track record of optimizing large-scale AI systems
Nice to Have
• Neural network architecture optimization experience
• Deep TensorRT optimization expertise
• Video analytics or real-time inference systems experience
• Experience operating large-scale GPU clusters Experience with WebAssembly (WASM) for performance-critical frontend computation.
• Advanced Linux OS, container (e.g. Docker) and GitHub skills