Overview
Remote
Depends on Experience
Contract - W2
Contract - Independent
Contract - 12 Month(s)
Skills
Core AI/ML Expertise
Python (advanced)
C++ (intermediate for CUDA optimization)
PyTorch
TensorFlow
Hugging Face Transformers
LangChain
LLMs (GPT
BERT
T5)
Job Details
AI/ML Engineer
Primary Responsibilities
- Deploy and optimize AI models on both Systalyze and Baseten platforms
- Implement and benchmark RAG (Retrieval-Augmented Generation) pipelines
- Conduct comprehensive performance testing and optimization
- GPU utilization analysis and CUDA optimization
- Cost analysis and resource efficiency evaluation
- Model inference latency and throughput benchmarking
Required Technical Skills
Core AI/ML Expertise:
- Programming Languages: Python (advanced), C++ (intermediate for CUDA optimization)
- ML Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, LangChain
- Model Types: LLMs (GPT, BERT, T5), Computer Vision models, Embedding models
CUDA & GPU Expertise:
- CUDA Programming: CUDA C/C++
- GPU Optimization: Memory management, kernel optimization, multi-GPU scaling
- Performance Profiling: NVIDIA Nsight, nvprof, CUDA profiler
- GPU Architectures: Understanding of Ampere, Hopper, Ada Lovelace architectures
- Tensor Operations: TensorRT optimization, ONNX runtime
- Memory Management: GPU memory optimization, batch processing strategies
Platform & Infrastructure:
- Containerization: Docker, NVIDIA Container Toolkit, GPU-enabled containers
- Orchestration: Kubernetes with GPU scheduling, NVIDIA GPU Operator
- Cloud Platforms: AWS (EC2 P/G instances), Azure (NC/ND series), Google Cloud Platform (A2/N1 instances)
Model Serving: TorchServe, TensorFlow Serving, Triton Inference Server
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.