ML Ops Architect

Overview

USD 60-65

Full Time

Part Time

Accepts corp to corp applications

Contract - W2

Contract - Independent

Skills

Machine Learning

Job Details

Customer-facing ML Ops roles
AWS (SageMaker, Glue, Lambda, CloudWatch)
Azure DevOps (Repos and Pipelines)
Terraform for IaC
Model deployment, monitoring, and support across multiple LOBs
Familiarity with ServiceNow for incident and change management

We are seeking a highly skilled and hands-on ML Ops Architect to be stationed onsite and work closely with customer stakeholders. The ideal candidate will be responsible for defining and standardizing ML Ops frameworks, supporting the deployment and monitoring of productionized models, and enabling the productionization of new models across multiple Lines of Business (LOBs). The architect must also ensure end-to-end automation, robust observability, and compliance with enterprise standards.

Key Responsibilities:

Customer Engagement:
- Serve as the primary technical point of contact for ML Ops discussions with customer stakeholders.
- Collaborate with data science, platform, and operations teams across LOBs to align on model deployment strategy.
- Gather and refine non-functional requirements (security, scalability, reliability, etc.) from the customer.
ML Ops Framework and Architecture:
- Define, document, and evolve ML Ops architecture patterns for model lifecycle management.
- Design robust, reusable, and secure CI/CD pipelines for ML models using Azure DevOps (Repos, Pipelines).
- Ensure reproducibility, auditability, and traceability for model training and deployment.
Model Deployment and Support:
- Oversee productionization of new ML models across various LOBs.
- Provide technical guidance and support for existing productionized models.
- Manage model versioning, rollback strategies, and model registry using SageMaker.
Infrastructure & Automation:
- Implement Infrastructure as Code using Terraform to provision and manage resources.
- Leverage AWS Glue, Lambda, Step Functions, and SNS for data and model pipeline automation.
- Maintain and optimize scheduler workflows using EventBridge.
Monitoring and Observability:
- Develop and maintain CloudWatch dashboards for model health and system metrics.
- Integrate EvidentlyAI for data drift and model performance monitoring.
- Ensure end-to-end observability including logging, metrics, and alerting.
Operations and Support:
- Maintain documentation for model support procedures, troubleshooting guides, and deployment checklists.
- Work with ServiceNow for incident, change, and problem management processes.
- Support L1/L2 teams by enabling efficient monitoring and resolution mechanisms.

Required Skills & Experience:

10+ years of IT experience with 3+ years in ML Ops or ML Engineering roles.
Strong hands-on experience with:
- Azure DevOps (Azure Repos, Pipelines)
- AWS ML stack: SageMaker, Glue, Lambda, Step Functions, SNS, S3, Athena
- Terraform for IaC
- CloudWatch, EvidentlyAI for monitoring
- Docker, ECR for image management
Deep understanding of ML model lifecycle management and CI/CD practices.
Proven ability to define enterprise-scale ML Ops frameworks and governance models.
Prior experience in working with ServiceNow for operational support workflows.
Strong communication and stakeholder management skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share