Infrastructure Engineer (Onsite NC)


Pacific Consultancy Services
Dice Job Match Score™
⏳ Almost there, hang tight...
Job Details
Skills
- GPU
- Infrastructure
Summary
Role: AI Infrastructure Engineer
Location: Charlotte, NC (Hybrid)
Role Overview:
We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.
Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.
Required Qualifications
8+ years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
- Dice Id: 91142718
- Position Id: 8983646
- Posted 3 hours ago
Company Info
Pacific Consultancy Services, founded with an ambitious vision in 2013, is a prominent IT Consulting and Service Delivery firm. The company is built upon the pillars of exceptional customer-centric solutions, streamlined processes, and impeccable technical expertise.
With nearly two decades of experience, Pacific consultancy services have been at the forefront of delivering intelligent solutions to clients worldwide, including the United States. Its specialized offerings encompass Artificial Intelligence, Machine Learning, Blockchain, Cloud services, IoT, DevOps, IT Staff Augmentation, and Cognitive Analytics, all contributing to achievable and profitable business models. Ensuring quality across all service domains is a priority for Pacific consultancy services. Its services span from IT Staff Augmentation to Digital Transformation, IT Consulting, and Emerging Technologies.
The work ethos of Pacific consultancy services revolves around its core “Model of Delivery,” aimed at providing clients with the best possible solutions.


Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs