Brevard, North Carolina
•
3d ago
Role: On-prem Platform Engineer Location: Brevard, Charlotte, NC (Onsite) Tech Skills Needed: vLLM TensorRT LLM Triton Inference Server SGLang Inference Optimization Continuous Batching Speculative Decoding KV Cache / Prefix Caching FP8 / AWQ / GPTQ Tensor Parallelism Kubernetes ML Serving KServe OpenShift AI Helm / Operators GPU Orchestration Run:AI Performance Benchmarking CUDA / NCCL / MIG Prometheus / Grafana ML Observability GuideLLM, Locust Responsibilities: Build, configure, and operate o
Easy Apply
Contract, Third Party
Depends on Experience
