Charlotte, North Carolina
•
Today
Role :: On-prem Platform Engineer Location: Charlotte, NC Key Skills: Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLangInference optimization techniques:Continuous batchingSpeculative decodingKV cache / Prefix cachingModel optimization:FP8, AWQ, GPTQDistributed & GPU Systems Tensor parallelism and large model scalingCUDA, NCCL, GPU architectureGPU partitioning & optimization (MIG)Kubernetes & ML Serving Kubernetes-based ML serv
Easy Apply
Third Party, Contract
65 - 75


