Search Jobs | Dice.com

Charlotte, North Carolina

•

Today

Role :: On-prem Platform Engineer Location: Charlotte, NC Key Skills: Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLangInference optimization techniques:Continuous batchingSpeculative decodingKV cache / Prefix cachingModel optimization:FP8, AWQ, GPTQDistributed & GPU Systems Tensor parallelism and large model scalingCUDA, NCCL, GPU architectureGPU partitioning & optimization (MIG)Kubernetes & ML Serving Kubernetes-based ML serv

Easy Apply

Third Party, Contract

65 - 75

On-prem Platform Engineer

Hybrid in Charlotte, North Carolina

•

Yesterday

vLLM, TensorRT-LLM, Triton Inference Server, SGLangInference optimization techniques:Continuous batchingSpeculative decodingKV cache / Prefix cachingModel optimization:FP8, AWQ, GPTQ Distributed & GPU Systems Tensor parallelism and large model scalingCUDA, NCCL, GPU architectureGPU partitioning & optimization (MIG)Kubernetes & ML Serving Kubernetes-based ML serving platformsKServe, OpenShift AIHelm charts, Operators, platform automationGPU Orchestration Run:AI or similar GPU scheduling/orchestra

Easy Apply

Full-time

Depends on Experience

Senior On-Prem GenAI Platform Engineer

Charlotte, North Carolina

•

Today

Senior On-Prem GenAI Platform EngineerLocation: Charlotte, NC < data-start="168" data-end="184">Job SummaryWe are seeking an experienced On-Prem GenAI Platform Engineer to build, optimize, and manage enterprise AI/ML platforms supporting Large Language Models (LLMs) and Generative AI workloads. The ideal candidate will have expertise in Kubernetes/OpenShift AI, GPU infrastructure, distributed systems, and LLM inference optimization. < data-start="520" data-end="545">Key ResponsibilitiesBuild and

Easy Apply

Contract

$50 - $60

Filter Results

Job post features

Posted date

Work settings

Employment type

Distance

Employer type

Work authorization