Apply Now

Senior AI Engineer (NVIDIA NIM & Triton)

Remote • Posted 2 days ago • Updated 1 day ago

Contract W2

Contract Corp To Corp

12 Months

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

Amazon Web Services
Artificial Intelligence
Banking
CUDA
Cloud Computing
Data Science
Docker
Energy
Financial Services
Generative Artificial Intelligence (AI)
Health Care
Insurance
Kubernetes
LangChain
Large Language Models (LLMs)
Machine Learning (ML)
Machine Learning Operations (ML Ops)
Management
Microservices
Microsoft Azure
NIM
Pharmaceutics
Public Sector
Python
Retail
Telecommunications

Summary

Job Title: Senior AI Engineer (NVIDIA NIM & Triton)

Location: Open Across USA (Remote)

Job Summary

We are seeking a Senior AI Engineer with strong experience in NVIDIA AI technologies, specifically NVIDIA NIM Microservices and Triton Inference Server. The ideal candidate will be responsible for designing, deploying, optimizing, and scaling Generative AI and LLM-based applications in enterprise environments.

Required Skills

Hands-on experience with NVIDIA NIM Microservices
Strong experience with NVIDIA Triton Inference Server
Experience deploying and serving Large Language Models (LLMs)
Knowledge of TensorRT-LLM and CUDA optimization
Experience with Kubernetes and Docker containerization
Strong Python programming skills
Experience building AI/ML applications in AWS, Azure, or Google Cloud Platform
Understanding of model inference, model serving, and performance tuning
Experience with REST APIs and microservices architecture

Preferred Skills

Experience with NVIDIA NeMo
Experience with RAG (Retrieval-Augmented Generation) architectures
Familiarity with LangChain or LlamaIndex
Exposure to MLOps/LLMOps practices
Experience with monitoring and observability tools

Responsibilities

Design and deploy AI applications using NVIDIA NIM Microservices
Build and optimize model serving infrastructure using Triton Inference Server
Deploy and manage LLM workloads in Kubernetes environments
Optimize inference performance using TensorRT-LLM and CUDA
Collaborate with Data Science, MLOps, and Platform Engineering teams
Implement scalable, secure, and production-ready AI solutions
Troubleshoot and improve AI application performance and reliability
Support cloud-based AI deployments across AWS, Azure, or Google Cloud Platform

About AgreeYa:
AgreeYa is a global systems integrator delivering a competitive advantage for its customers through software, solutions, and services. Established in 1999, AgreeYa is headquartered in Folsom, California, with a global footprint and a team of more than 1,800+ professionals across offices. AgreeYa works with 550+ organizations ranging from Fortune 100 firms to small and large businesses across industries such as Telecom, Banking, Financial Services & Insurance, Healthcare, Utility & Energy, Technology, Public Sector, Pharma & Biotech, Retail, Client, and others. Please visit us at for more information.
Equal Opportunity:
AgreeYa is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, religion, gender identity, sexual orientation, national origin, disability, veteran status or other protected characteristics. Visit our website at to learn about our Career & Culture.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: swapps
Position Id: 8998393
Posted 2 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Hmmm, it looks like we aren't able to display any Similar Jobs here. Please check back later.

Search all similar jobs

Remote jobs at AgreeYa Solutions