Apply Now

Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC

Charlotte, NC, US • Posted 12 hours ago • Updated 1 hour ago

Contract Corp To Corp

Contract Independent

Contract W2

On-site

$0,00/-

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

AI
LLM
NVIDIA H200
RunAI
OpenShift AI

Summary

TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years.

TECHNOGEN is a Small & Woman Owned Minority Business with GSA Advantage Certification. We have offices in VA; MD & Offshore development centers in India. We have successfully executed 100+ projects for clients ranging from small business and non-profits to Fortune 50 companies and federal, state and local agencies.

Description:

Local candidates preferred.

Role Overview:
We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications
8+ years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10217412
Position Id: 2026-42788
Posted 12 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

LLM Inference & GPU Systems Consultant

Charlotte, North Carolina

•

Today

Role : LLM Inference & GPU Systems Consultant Location : Charlotte , NC ( Locals only) We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model

Easy Apply

Third Party, Contract

Depends on Experience

AI Architect

Hybrid in Charlotte, North Carolina

•

10d ago

Company Overview: Req ID: 371108 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a AI Architect to join our team in Charlotte, North Carolina (US-NC), United States (US). Job Description: Job Duties: Role Overview: We are seeking a Principal GenAI Architect to serve as a hands-on practitioner and core technical visionary.

Easy Apply

Contract

$133

AI Architect

Hybrid in Charlotte, North Carolina

•

11d ago

Locations : - Charlotte NC, Dallas, Iselin, NJ Hybrid - 2/3 days onsite 12 months contract with possible extension Company Overview: NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a AI Architect to join our team in Charlotte, North Carolina (US-NC), United States (US). Job Description: Job Duties: Role Overview: We are se

Easy Apply

Contract, Third Party

$133

Cloud GenAI Governance & Observability Consultant

Charlotte, North Carolina

•

Today

TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years. TECHNOGEN is a Small & Woman Owned Minority Business with GSA Advantage Certification. We have offices in VA; MD & Offshore development centers in India. We have successfully executed 100+ projects for clients ranging from small business and non-profits to Fortune 50 companies and federal, state and local agencies. Description: Local candidates only. Must be onsite at client in Cha

Easy Apply

Contract, Third Party

Search all similar jobs