MLOps - Machine Learning

  • Posted 4 hours ago | Updated 4 hours ago

Overview

Remote
$50 - $50
Contract - W2
Contract - 12 Month(s)

Skills

MLOPS
GPU
Machine Learning

Job Details

Remote - Machine Learning Engineer MLOps & GPU

Efficiency Location: Remote, onsite Job Summary: We are seeking 2 3 experienced Machine Learning Engineers with strong expertise in MLOps, Ray distributed computing, and GPU efficiency optimization. The ideal candidates will have hands-on experience monitoring and improving GPU utilization using performance metrics and building scalable ML infrastructure for training and inference workflows. Key Responsibilities: Design, build, and manage MLOps pipelines to support large-scale ML model development, training, and deployment. Implement and scale distributed training using Ray across GPU clusters. Monitor, analyze, and optimize GPU performance metrics (e.g., memory usage, FLOPs, utilization) to maximize efficiency. Collaborate with data scientists and ML engineers to ensure production readiness of ML models. Integrate observability tools for real-time monitoring of training/inference workloads. Automate workflows for data ingestion, preprocessing, model versioning, and CI/CD of ML models. Required Skills & Experience: 3+ years of experience in Machine Learning Engineering, DevOps, or MLOps. Proven experience in designing and managing MLOps pipelines using tools like MLflow, Kubeflow, Airflow, or SageMaker Pipelines. Hands-on experience with Ray (Tune, Train, Serve) for distributed ML workloads. Deep understanding of GPU profiling, optimization techniques, and tools such as NVIDIA Nsight, nvidia-smi, TensorBoard, or PyTorch Profiler. Experience working with cloud GPU environments (AWS, Google Cloud Platform, or Azure). Strong programming skills in Python; familiarity with PyTorch or TensorFlow. Familiar with monitoring/logging tools such as Prometheus, Grafana, Datadog, or Weights & Biases. Preferred Qualifications: Experience working on large-scale ML systems in production environments. Knowledge of containerization technologies like Docker and orchestration tools such as Kubernetes. Familiarity with model compression or quantization techniques for GPU optimization. Experience in high-throughput or latency-sensitive ML applications.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.