NVIDIA H200 -- LLM Inference & GPU Systems Consultant

Hybrid in Charlotte, NC, US • Posted 9 hours ago • Updated 9 hours ago
Contract Independent
Contract W2
12 Months
No Travel Required
Hybrid
$70 - $80/hr
Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

  • Generative Artificial Intelligence (AI)
  • Kubernetes
  • Artificial Intelligence
  • GPU
  • Lifecycle Management
  • Computer Hardware
  • Onboarding

Summary

Role Overview:
We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications
8+ years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91008812
  • Position Id: 8976266
  • Posted 9 hours ago
Contact the job poster
CG

Chandra Gowda

Sr. Recruiter @ Kasmo Inc.
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Charlotte, North Carolina

2d ago

Easy Apply

Contract, Third Party

Depends on Experience

Charlotte, North Carolina

Today

Easy Apply

Third Party, Contract

$0,00/-

Charlotte, North Carolina

Today

Easy Apply

Third Party, Contract

$$45/hr - $50/hr

Hybrid in Charlotte, North Carolina

12d ago

Easy Apply

Contract

$133

Search all similar jobs