Charlotte, North Carolina
•
Today
Role : LLM Inference & GPU Systems Consultant Location : Charlotte , NC ( Locals only) We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model
Easy Apply
Third Party, Contract
Depends on Experience

