Overview
On Site
Full Time
Skills
AWS
Python
Kubernetes
Terraform
GCP
ML frameworks
Job Details
Role: MLops Lead
Location: New York, NY Hybrid
Fulltime Role
Job description:
The ML Ops Lead drives the design, deployment, and optimization of machine learning solutions, balancing hands-on engineering with strategic leadership to enable robust, scalable, and maintainable AI infrastructure.
Key Responsibilities
- Architect and maintain scalable ML infrastructure, self-service ML pipelines, and CI/CD workflows for model training and deployment.
- Lead and mentor an MLOps team, fostering technical excellence and continual improvement.
- Design high-scale distributed training and inference environments using cloud (AWS, Google Cloud Platform) and on-premises resources.
- Build and manage feature stores, data ingestion, preprocessing, and validation pipelines.
- Implement A/B testing, canary releases, monitoring, and rollback mechanisms for production ML models.
- Ensure compliance with data governance, privacy, and security standards; manage role-based access controls for ML infrastructure.
- Collaborate with data scientists, software engineers, DevOps, and product teams to bring models from experimentation to enterprise-grade production.
Required Skills and Experience
- Deep expertise in creating and managing machine learning infrastructure and orchestration frameworks (e.g., Kubeflow, MLflow, Airflow).
- Proficiency in cloud platforms (AWS, Google Cloud Platform), Kubernetes, Terraform, and distributed computing.
- Excellent skills in Python and ML frameworks (TensorFlow, TorchServe), CI/CD automation, and pipeline management.
- Strong analytical, problem-solving, and project management abilities.
- Demonstrated ability to build, scale, and lead technical teams.
- Solid understanding of data compliance, governance, and model monitoring.
Desired Qualifications
- Experience optimizing GPU/TPU utilization and large-scale storage solutions.
- Track record in designing robust monitoring systems for model drift, downtime, and performance.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.