Lead AI Engineer (FM Hosting, LLM Inference)/Remote

Remote • Posted 2 hours ago • Updated 2 hours ago
Contract W2
Contract Independent
Contract Corp To Corp
1 Year
No Travel Required
Remote
Depends on Experience
Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

  • Lead AI Engineer (FM Hosting
  • LLM Inference)

Summary

 

Lead AI Engineer (FM Hosting, LLM Inference) 

Location- Remote

Job Title

Lead AI Engineer – Foundation Model Hosting & LLM Inference

Job Summary

We are looking for an experienced Lead AI Engineer to design, deploy, and optimize large-scale Foundation Model (FM) hosting and LLM inference platforms. The ideal candidate will lead AI infrastructure initiatives, improve model serving performance, and build scalable, secure, and cost-efficient AI systems for enterprise applications.

Key Responsibilities

  • Design and manage scalable infrastructure for hosting foundation models and LLMs.
  • Develop and optimize high-performance inference pipelines for low latency and high throughput.
  • Deploy and manage models using containerized and distributed environments.
  • Work with GPU acceleration, model quantization, batching, caching, and inference optimization techniques.
  • Implement APIs and microservices for AI model serving.
  • Monitor system reliability, availability, scalability, and cost efficiency.
  • Collaborate with AI/ML teams to productionize machine learning and generative AI models.
  • Lead architecture decisions for model deployment, orchestration, and observability.
  • Ensure security, governance, and compliance for AI infrastructure.
  • Mentor engineering teams and drive AI platform best practices.

Required Skills

  • Strong expertise in Python and backend system development.
  • Hands-on experience with LLM serving frameworks such as vLLM, TensorRT-LLM, or Text Generation Inference.
  • Experience with distributed computing, GPU infrastructure, and Kubernetes.
  • Knowledge of transformer architectures, model optimization, and inference tuning.
  • Experience with cloud platforms such as Amazon Web Services, Microsoft Azure, or Google Cloud.
  • Familiarity with Docker, CI/CD pipelines, and infrastructure automation.
  • Understanding of vector databases, embeddings, and retrieval systems.
  • Strong debugging, performance tuning, and problem-solving skills.
  • Excellent leadership and stakeholder communication abilities.

Preferred Qualifications

  • Bachelor’s or Master’s degree in Computer Science, AI, Machine Learning, or related field.
  •  
  • Experience deploying open-source or enterprise LLMs in production environments.
  • Knowledge of MLOps and observability tools.
  • Exposure to RAG architectures, fine-tuning, and AI agents is a plus.

Tools & Technologies

  • Python, FastAPI
  • vLLM / TensorRT-LLM
  • Kubernetes, Docker
  • PyTorch, CUDA
  • Ray, Triton Inference Server
  • Vector Databases (Pinecone, Milvus, FAISS)
  • Amazon Web Services / Microsoft Azure / Google Cloud
  • CI/CD & Monitoring Tools
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10513292
  • Position Id: 72535-12895-
  • Posted 2 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Yesterday

Easy Apply

Contract

Depends on Experience

Remote

30+d ago

Easy Apply

Contract

Depends on Experience

Remote

14d ago

Easy Apply

Contract, Third Party

70 - 120

Remote

Yesterday

Easy Apply

Contract

Depends on Experience

Search all similar jobs