Palo Alto, California
•
Today
Business Unit What the Role Entails End-to-End Inference Optimization: Lead the optimization of the full inference pipeline for Large Models (LLM, Multimodal); focus on KV Cache storage strategies, Router architecture design, and collaborative operator optimization to maximize throughput and minimize latency.Heterogeneous Computing Research: Conduct in-depth research into the underlying inference logic of various hardware accelerators; evaluate architectural suitability for real-time, batch, an
Full-time
USD 120,100.00 - 225,700.00 per year












