Overview
Skills
Job Details
Job Title: MLOps Engineer Distributed Systems (Ray)
Location: Austin-TX
Duration: Long Term
Job Summary:
We are looking for a skilled and motivated MLOps Engineer with deep expertise in distributed machine learning and a strong understanding of Ray for scalable and efficient ML infrastructure. You will be responsible for automating and maintaining ML pipelines, enabling reproducible training and inference, and working alongside Data Scientists and ML Engineers to productionize models at scale.
Key Responsibilities:
Design, build, and manage end-to-end MLOps pipelines using Ray for training, tuning, serving, and monitoring ML models
Leverage Ray Train for distributed model training across CPU/GPU clusters
Implement scalable hyperparameter optimization using Ray Tune
Deploy ML models into production using Ray Serve with FastAPI/Flask
Integrate data preprocessing pipelines with Ray Data and orchestrate workflows via Airflow or Kubernetes
Maintain and monitor deployed models, ensuring performance and accuracy over time (model drift, data quality)
Collaborate with Data Science and DevOps teams to align on scalable ML architecture
Ensure reproducibility, versioning, and CI/CD practices using MLflow, GitHub Actions, Jenkins, or similar tools
Develop infrastructure on cloud (AWS/Google Cloud Platform/Azure) with autoscaling and cost optimization in mind
Required Skills & Experience:
3+ years in MLOps, Machine Learning Engineering, or related roles
Hands-on experience with Ray Core, Ray Tune, Ray Train, Ray Serve
Strong programming experience in Python, and familiarity with libraries like PyTorch, TensorFlow, Scikit-learn
Solid understanding of containerization and orchestration using Docker, Kubernetes, Helm
Experience building CI/CD pipelines for ML using tools like Jenkins, GitHub Actions, Azure DevOps
Familiarity with cloud platforms (AWS, Azure, or Google Cloud Platform) and distributed compute environments
Experience with model tracking/versioning tools like MLflow, DVC, or Weights & Biases
Strong communication skills and ability to work in cross-functional teams