Apply Now

LLM Inference / AI Infrastructure Engineer

Charlotte, NC, US • Posted 1 day ago • Updated 1 day ago

Contract W2

12 Months

On-site

Depends on Experience

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

LLM Inference / AI Infrastructure Engineer

Summary

LLM Inference / AI Infrastructure Engineer
Location: Charlotte, NC
Duration: 9-12 Month

JD:
vLLM TensorRTLLM Triton Inference Server SGLang Inference Optimization Continuous Batching Speculative Decoding KV Cache / Prefix Caching FP8 / AWQ / GPTQ Tensor Parallelism Kubernetes ML Serving KServe OpenShift AI Helm / Operators GPU Orchestration Run:AI Performance Benchmarking CUDA / NCCL / MIG Prometheus / Grafana ML Observability

skills sanity check: HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10121431
Position Id: 8979669
Posted 1 day ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

LLM Inference & GPU Systems Consultant

Charlotte, North Carolina

•

Today

Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte-NC Local candidates only Duration: Long Term Must have : RunAI /LLM Inference & GPU / vLLM and TensorRT-LLM. Required Skills & Experience Required Qualifications 8+ years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer. 8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode). Proficiency in OpenShift AI and GPU orchestration tool

Easy Apply

Contract

LLM Inference & GPU Systems Consultant

Charlotte, North Carolina

•

9d ago

Role : LLM Inference & GPU Systems Consultant Location : Charlotte , NC ( Locals only) We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model

Easy Apply

Third Party, Contract

Depends on Experience

Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC

Charlotte, North Carolina

•

Today

TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years. TECHNOGEN is a Small & Woman Owned Minority Business with GSA Advantage Certification. We have offices in VA; MD & Offshore development centers in India. We have successfully executed 100+ projects for clients ranging from small business and non-profits to Fortune 50 companies and federal, state and local agencies. Description: Local candidates preferred. Role Overview: We are se

Easy Apply

Contract, Third Party

$0,00/-

Cloud Infrastructure Engineer

Charlotte, North Carolina

•

Today

Job Title: Cloud Infrastructure Engineer Location: Charlotte, NC (5 Days onsite) Duration: 12+ months Primary Skills vLLM TensorRT-LLM Triton Inference Server SGLang Kubernetes ML Serving KServe OpenShift AI GPU Orchestration Google Cloud Platform Terraform Nvidia Key Responsibilities Design and manage scalable AI/ML infrastructure for GenAI and LLM workloads. Deploy and optimize LLM inference pipelines using vLLM, TensorRT-LLM, Triton Inference Server, and SGLang. Implement inference optimi

Easy Apply

Third Party, Contract

$$55/hr - $60/hr

Search all similar jobs