MLOps Engineer

Overview

On Site
$50 - $60
Contract - W2
Contract - Independent
Contract - 12 Month(s)

Skills

Amazon SageMaker
Amazon Web Services
Apache Kafka
Apache Spark
A/B Testing
CircleCI
Artificial Intelligence
Cloud Computing
Conflict Resolution
Continuous Improvement
Data Governance
Continuous Integration
Continuous Delivery
Data Processing
Data Management
Data Quality
Data Science
Database
Debugging
DevOps
Docker
Documentation
Evaluation
FOCUS
GitHub
GitLab
Good Clinical Practice
Google Cloud Platform
Jenkins
Machine Learning Operations (ML Ops)
Terraform
PyTorch
Python
Vertex
TensorFlow
Version Control
Programming Languages
Grafana
Large Language Models (LLMs)
Machine Learning (ML)
NoSQL
Microsoft Azure

Job Details

Job Title: ML Ops Engineer
Location: Pheonix, AZ (Day one onsite)
Job Description:
Job Summary: We are seeking a highly skilled and motivated MLOps Engineer to join our growing data science and machine learning team. The MLOps Engineer will be responsible for designing, implementing, and maintaining robust and scalable machine learning infrastructure, pipelines, and workflows. This role requires a deep understanding of data management, software development best practices, cloud computing, and the machine learning lifecycle. The successful candidate will work closely with data scientists, data engineers, and software engineers to ensure that machine learning models are deployed, monitored, and updated efficiently and effectively in production.
Key Responsibilities:
ML Pipeline Development & Automation:
Design, build, and maintain end-to-end machine learning pipelines (data ingestion, data preparation, feature engineering, model training, evaluation, deployment).
Automate model training, testing, deployment, and retraining processes using CI/CD principles.
Implement robust version control for data, code, and models to ensure reproducibility and governance.
Model Deployment & Serving:
Deploy machine learning models efficiently and reliably into production environments.
Containerize models using technologies like Docker and orchestrate deployments with Kubernetes.
Develop and manage APIs for model inference, ensuring high performance and low latency.
Infrastructure & Scalability:
Build and maintain scalable, reliable, and efficient machine learning infrastructure on cloud platforms (AWS, Azure, Google Cloud Platform).
Optimize machine learning pipelines for performance, cost-effectiveness, and resource utilization.
Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.
Monitoring, Alerting & Troubleshooting:
Set up comprehensive monitoring and logging tools to track model performance, data drift, concept drift, and system health in real-time.
Establish alerts and notifications for anomalies or deviations from expected behavior.
Troubleshoot and resolve issues related to model deployment, performance, and data quality in production environments.
Collaboration & Best Practices:
Collaborate closely with data scientists to understand model requirements and optimize models for production deployment.
Work with data engineers to ensure data pipelines provide high-quality, production-ready data for ML models.
Partner with DevOps and software engineering teams to integrate ML systems seamlessly into existing software infrastructure.
Advocate for and implement best practices in MLOps, including observability, reproducibility, and security.
Documentation & Continuous Improvement:
Create and maintain clear technical documentation for ML infrastructure, pipelines, and workflows.
Stay up-to-date with the latest developments in machine learning, cloud computing, and MLOps tools/technologies.
Contribute to the continuous improvement of our MLOps architecture and strategy.
Required Qualifications:
Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field.
Proven experience 8+ years as an MLOps Engineer, Machine Learning Engineer, or similar role with a focus on production ML systems.
Strong proficiency in programming languages such as Python.
Hands-on experience with machine learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn).
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) and their relevant ML services (e.g., AWS Sagemaker, Google Cloud Platform Vertex AI, Azure ML).
Solid understanding of DevOps principles and experience with CI/CD pipelines (e.g., GitLab CI, GitHub Actions, Jenkins, CircleCI).
Proficiency with containerization (Docker) and orchestration tools (Kubernetes).
Familiarity with data processing and streaming frameworks (e.g., Apache Spark, Apache Kafka).
Experience with model monitoring tools (e.g., Prometheus, Grafana, Evidently AI, WhyLabs).
Strong problem-solving skills and the ability to debug complex issues across various systems.
Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
Preferred Qualifications:
Experience with MLOps platforms like Kubeflow or MLflow.
Knowledge of real-time inference and serving patterns.
Experience building and scaling feature engineering platforms (e.g., Feast, Tecton).
Familiarity with database technologies (SQL, NoSQL).
Understanding of data governance and security best practices for ML.
Experience with A/B testing and model validation in production environments.
Prior experience working in a regulated or production-critical environment.
Familiarity with Large Language Models (LLMs) and LLMOps practices (prompt engineering, fine-tuning, serving LLMs).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About iCUBE Solutions