MLOps Engineer

Overview

On Site

$50 - $60

Contract - W2

Contract - Independent

Contract - 12 Month(s)

Skills

Amazon SageMaker

Amazon Web Services

Apache Kafka

Apache Spark

A/B Testing

CircleCI

Artificial Intelligence

Cloud Computing

Conflict Resolution

Continuous Improvement

Data Governance

Continuous Integration

Continuous Delivery

Data Processing

Data Management

Data Quality

Data Science

Database

Debugging

DevOps

Docker

Documentation

Evaluation

FOCUS

GitHub

GitLab

Good Clinical Practice

Google Cloud Platform

Jenkins

Machine Learning Operations (ML Ops)

Terraform

PyTorch

Python

Vertex

TensorFlow

Version Control

Programming Languages

Grafana

Large Language Models (LLMs)

Machine Learning (ML)

NoSQL

Microsoft Azure

Job Details

Job Title: ML Ops Engineer

Location: Pheonix, AZ (Day one onsite)

Job Description:

Job Summary: We are seeking a highly skilled and motivated MLOps Engineer to join our growing data science and machine learning team. The MLOps Engineer will be responsible for designing, implementing, and maintaining robust and scalable machine learning infrastructure, pipelines, and workflows. This role requires a deep understanding of data management, software development best practices, cloud computing, and the machine learning lifecycle. The successful candidate will work closely with data scientists, data engineers, and software engineers to ensure that machine learning models are deployed, monitored, and updated efficiently and effectively in production.

Key Responsibilities:

ML Pipeline Development & Automation:

Design, build, and maintain end-to-end machine learning pipelines (data ingestion, data preparation, feature engineering, model training, evaluation, deployment).

Automate model training, testing, deployment, and retraining processes using CI/CD principles.

Implement robust version control for data, code, and models to ensure reproducibility and governance.

Model Deployment & Serving:

Deploy machine learning models efficiently and reliably into production environments.

Containerize models using technologies like Docker and orchestrate deployments with Kubernetes.

Develop and manage APIs for model inference, ensuring high performance and low latency.

Infrastructure & Scalability:

Build and maintain scalable, reliable, and efficient machine learning infrastructure on cloud platforms (AWS, Azure, Google Cloud Platform).

Optimize machine learning pipelines for performance, cost-effectiveness, and resource utilization.

Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.

Monitoring, Alerting & Troubleshooting:

Set up comprehensive monitoring and logging tools to track model performance, data drift, concept drift, and system health in real-time.

Establish alerts and notifications for anomalies or deviations from expected behavior.

Troubleshoot and resolve issues related to model deployment, performance, and data quality in production environments.

Collaboration & Best Practices:

Collaborate closely with data scientists to understand model requirements and optimize models for production deployment.

Work with data engineers to ensure data pipelines provide high-quality, production-ready data for ML models.

Partner with DevOps and software engineering teams to integrate ML systems seamlessly into existing software infrastructure.

Advocate for and implement best practices in MLOps, including observability, reproducibility, and security.

Documentation & Continuous Improvement:

Create and maintain clear technical documentation for ML infrastructure, pipelines, and workflows.

Stay up-to-date with the latest developments in machine learning, cloud computing, and MLOps tools/technologies.

Contribute to the continuous improvement of our MLOps architecture and strategy.

Required Qualifications:

Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field.

Proven experience 8+ years as an MLOps Engineer, Machine Learning Engineer, or similar role with a focus on production ML systems.

Strong proficiency in programming languages such as Python.

Hands-on experience with machine learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn).

Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) and their relevant ML services (e.g., AWS Sagemaker, Google Cloud Platform Vertex AI, Azure ML).

Solid understanding of DevOps principles and experience with CI/CD pipelines (e.g., GitLab CI, GitHub Actions, Jenkins, CircleCI).

Proficiency with containerization (Docker) and orchestration tools (Kubernetes).

Familiarity with data processing and streaming frameworks (e.g., Apache Spark, Apache Kafka).

Experience with model monitoring tools (e.g., Prometheus, Grafana, Evidently AI, WhyLabs).

Strong problem-solving skills and the ability to debug complex issues across various systems.

Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Preferred Qualifications:

Experience with MLOps platforms like Kubeflow or MLflow.

Knowledge of real-time inference and serving patterns.

Experience building and scaling feature engineering platforms (e.g., Feast, Tecton).

Familiarity with database technologies (SQL, NoSQL).

Understanding of data governance and security best practices for ML.

Experience with A/B testing and model validation in production environments.

Prior experience working in a regulated or production-critical environment.

Familiarity with Large Language Models (LLMs) and LLMOps practices (prompt engineering, fine-tuning, serving LLMs).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About iCUBE Solutions

Share