Overview
Skills
Job Details
Job Title: AI Solutions Architect (LLM & Generative AI )
Location: Santa Clara, CA (Hybrid)
Duration/Term: Long Term Contract
Job Description:
We are seeking an experienced AI Architect to design and deploy scalable AI systems leveraging LLMs (GPT, LLaMA, Claude, etc.). The ideal candidate will have expertise in fine-tuning language models, optimizing Retrieval-Augmented Generation (RAG) pipelines, and developing agent-based architectures. Additionally, they will play a key role in integrating AI agents with enterprise tools and ensuring model performance optimization.
Key Responsibilities:
- LLM Development & Fine-Tuning: Build and fine-tune SLMs/LLMs using domain-specific data (e.g., ITSM, security, operations).
- RAG Optimization: Design and optimize Retrieval-Augmented Generation (RAG) pipelines with vector databases (FAISS, Chroma, Weaviate, Pinecone).
- Agent-Based Architectures: Develop agent-based AI architectures using LangGraph, AutoGen, CrewAI, or custom frameworks.
- Enterprise AI Integration: Integrate AI agents with enterprise tools (ServiceNow, Jira, SAP, Slack, etc.).
- Model Performance Optimization: Enhance AI model efficiency through quantization, distillation, batching, caching
- DevOps & MLOps Collaboration: Work closely with DevOps and MLOps teams to build CI/CD pipelines for models and agents.
- Technical Leadership: Conduct code and research reviews, mentor junior engineers, and contribute to technical strategy.
Qualifications:
Must Have:
- 5-8 years of hands-on experience in AI/ML/Deep Learning.
- Strong coding skills in Python (must); familiarity with js, Go, or Rust is a plus.
- Proficiency with PyTorch or TensorFlow.
- Deep understanding of transformer architectures, embeddings, and attention mechanisms.
- Experience with LangChain, Transformers (HuggingFace), or LlamaIndex.
- Working knowledge of LLM fine-tuning (LoRA, QLoRA, PEFT) and prompt engineering.
- Hands-on experience with vector databases (FAISS, Pinecone, Weaviate, Chroma).
- Cloud experience on Azure, AWS, or Google Cloud Platform (Azure preferred).
- Experience with Kubernetes, Docker, and scalable microservice deployments.
- Strong knowledge of REST APIs, webhooks, and enterprise system integrations (ServiceNow, SAP, etc.).
- Solid understanding of data pipelines, ETL, and structured/unstructured data ingestion.
Key Skills:
LLMs, AI Architecture, Python, PyTorch, TensorFlow, Transformer Models, LangChain, HuggingFace, LlamaIndex, RAG Optimization, Vector Databases, LoRA, QLoRA, PEFT, Azure, AWS, Google Cloud Platform, Kubernetes, Docker, Microservices, CI/CD, REST APIs, ETL, Enterprise Integrations
VDart Group, a global leader in technology, product, and talent management, empowers businesses with comprehensive solutions through our four distinct, industry-leading business units With a diverse team of over 4,000 professionals across 13 countries, we deliver strong results across various industries, including Fortune 500 companies
Committed to "People, Purpose, Planet," we prioritize social responsibility and sustainability, as evidenced by our EcoVadis Bronze Medal Certification and participation in the UN Global Compact
Our dedication to delivering strong results has earned us recognition as a trusted advisor for businesses seeking to drive innovation and growth, including many Fortune 500 companies Join our network! Partner with VDart Group to leverage our global network, industry expertise, and proven track record with a diverse clientele