Overview
Skills
Job Details
Position: MLOPS (GPU Efficiency & Optimization)
Location: Remote
Duration: Contract C2C
Job Description:
Deep understanding of GPU efficiency & GPU optimization, Operate, monitor, and triage all aspects of our production and non-production environments. Automate deployment and orchestration of services into the cloud environment as well as other routine processes. Work on multiple cloud environment like AWS and Google Cloud Platform.
Actively participate in capacity planning, scale testing, and disaster recovery exercises.
Interact with and support partner teams, including Engineering, QA, and program management.
Troubleshoot customer concerns for ML Tuning and inference endpoints on Ray.
Designing and implementing RESTful/RPC API and services using Golang OR Python.
Implement SLO/SLI, error budget reporting for various customers
Thanks,
Nitesh