Job Title: AI Operations Platform Consultant (LLM & Kubernetes)
Experience: 8+ Years
Location & Work Schedule
- Location: Charlotte, NC OR Jersey City, NJ (candidate may choose location)
- Work Model: Hybrid – 3 days per week onsite
- Business Hours: Monday–Friday, normal business hours
Job Overview
We are seeking an experienced AI Operations Platform Consultant to support the deployment, operation, and optimization of Large Language Model (LLM) inference platforms in a mission-critical, enterprise environment. The ideal candidate will have strong hands-on expertise with Kubernetes (OpenShift) and LLM deployment frameworks such as TensorRT-LLM and Triton Inference Server, along with experience managing MLOps/LLMOps pipelines in production.
This role focuses on ensuring high availability, performance, scalability, and operational excellence for AI inference services.
Must-Have Skills
- Large Language Models (LLMs)
- Kubernetes / OpenShift
Key Responsibilities
AI Platform Deployment & Operations
- Deploy, manage, operate, and troubleshoot containerized AI services at scale on Kubernetes (OpenShift) for mission-critical applications
- Deploy, configure, tune, and optimize LLMs using TensorRT-LLM and Triton Inference Server
- Manage scalable infrastructure for deploying and operating LLM-based inference services
- Support production-grade AI inference platforms with high availability and performance requirements
MLOps / LLMOps
- Design, operate, and support MLOps / LLMOps pipelines for production inference workloads
- Deploy inference services using TensorRT-LLM and Triton Inference Server
- Monitor, maintain, and improve inference pipelines across environments
- Ensure reliable model lifecycle management, including updates and rollbacks
Monitoring, Performance & Reliability
- Set up and operate monitoring solutions for AI inference services (performance, availability, latency, throughput)
- Troubleshoot issues related to model performance, scalability, load balancing, and container orchestration
- Implement best practices for observability, alerting, and system health monitoring
Model Optimization & Inference
- Apply model optimization techniques including:
- Pruning
- Quantization
- Knowledge distillation
- Optimize models using Triton Inference Server with TensorRT-LLM (TRTLLM)
- Ensure efficient GPU utilization and inference performance tuning
Enterprise Operations & Governance
- Follow standard enterprise operational processes:
- Incident Management
- Change Management
- Event Management
- Support operational readiness and production stability for AI platforms
- Collaborate with cross-functional teams including infrastructure, security, and AI/ML teams
Required Qualifications
- Strong hands-on experience with Kubernetes, preferably OpenShift
- Proven experience deploying and operating LLMs in production environments
- Expertise with:
- TensorRT-LLM
- Triton Inference Server (architecture, configuration, deployment)
- Experience with containerization, microservices, and API-based inference services
- Strong troubleshooting skills in distributed, containerized systems
- Experience managing scalable AI infrastructure
- Understanding of enterprise-grade operational best practices