Overview
Skills
Job Details
Role:-AI/ML Engineer with Systalyze and CUDA Exp
Duration: Long term
Location:-100% remote
Primary Responsibilities
Deploy and optimize AI models on both Systalyze and Baseten platforms
Implement and benchmark RAG (Retrieval-Augmented Generation) pipelines
Conduct comprehensive performance testing and optimization
GPU utilization analysis and CUDA optimization
Cost analysis and resource efficiency evaluation
Model inference latency and throughput benchmarking
Required Technical Skills
Core AI/ML Expertise:
Programming Languages: Python (advanced), C++ (intermediate for CUDA optimization)
ML Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, LangChain
Model Types: LLMs (GPT, BERT, T5), Computer Vision models, Embedding models
CUDA & GPU Expertise:
CUDA Programming: CUDA C/C++
GPU Optimization: Memory management, kernel optimization, multi-GPU scaling
Performance Profiling: NVIDIA Nsight, nvprof, CUDA profiler
GPU Architectures: Understanding of Ampere, Hopper, Ada Lovelace architectures
Tensor Operations: TensorRT optimization, ONNX runtime
Memory Management: GPU memory optimization, batch processing strategies
Platform & Infrastructure:
Containerization: Docker, NVIDIA Container Toolkit, GPU-enabled containers
Orchestration: Kubernetes with GPU scheduling, NVIDIA GPU Operator
Cloud Platforms: AWS (EC2 P/G instances), Azure (NC/ND series), Google Cloud Platform (A2/N1 instances)
Model Serving: TorchServe, TensorFlow Serving, Triton Inference Server