Remote
•
Today
Position SummaryWe are looking for an AI Site Reliability Engineer to manage, optimize, and scale high-performance compute (HPC) and AI platforms including NVIDIA DGX and Cisco UCS. This role blends SRE principles, AI/ML operationalization, and infrastructure automation for mission-critical environments. ResponsibilitiesManage & scale HPC platforms (NVIDIA DGX / Cisco UCS) for AI workloads. Ensure availability, latency, scalability, and efficiency across systems. Drive capacity planning, perform
Easy Apply
Contract, Third Party
$40 - $50