Charlotte, North Carolina
•
Yesterday
LLM Inference / AI Infrastructure Engineer Location: Charlotte, NC Duration: 9-12 Month JD: vLLM TensorRTLLM Triton Inference Server SGLang Inference Optimization Continuous Batching Speculative Decoding KV Cache / Prefix Caching FP8 / AWQ / GPTQ Tensor Parallelism Kubernetes ML Serving KServe OpenShift AI Helm / Operators GPU Orchestration Run:AI Performance Benchmarking CUDA / NCCL / MIG Prometheus / Grafana ML Observability skills sanity check: HAVE YOU WORKED ON Nvidia H200? If yes, chance
Easy Apply
Contract
Depends on Experience












