ML/OPS Engineer - IV

Overview

On Site
Depends on Experience
Contract - W2
Contract - Independent
Contract - 12 Month(s)

Skills

Experience with databricks
creating ML ops pipeline
AWS core pipelineMust have at least 4 years of hands on experience
Experience with models for data scientists to use (do not expect candidates to be data scientists/engineers)
Building
scaling
automating and orchestrating model pipelines; Experience in specific tech stacks include: MLFlow
AutoML
MosaicML
Seldon
Airflow
Docker
Kubernetes
Helm or similar
AWS Sagemaker
Databricks
Grafana or similar
Tecton or similar
CUDA or similarMLOps Engineer (AWS & Databricks)
Design
implement
and maintain CI/CD pipelines for machine learning applications using AWS CodePipeline
CodeCommit
and CodeBuild.
Automate the deployment of ML models into production using Amazon SageMaker
and MLflow for model versioning
tracking
and lifecycle management.
Develop
test
and deploy AWS Lambda functions for triggering model workflows
automating pre/post-processing
and integrating with other AWS services.
Maintain and monitor Databricks model serving endpoints
ensuring scalable and low-latency inference workloads.
Use Airflow (MWAA) or Databricks Workflows to orchestrate complex
multi-stage ML pipelines
including data ingestion
model training
evaluation
and deployment.
Collaborate with Data Scientists and ML Engineers to productionize models and convert notebooks into reproducible and version-controlled ML pipelines.
Integrate and automate model monitoring (drift detection
performance logging) and alerting mechanisms using tools like CloudWatch
Prometheus
or Datadog.
Optimize compute workloads by managing infrastructure-as-code (IaC) via CloudFormation or Terraform for reproducible
secure deployments across environments.
Ensure secure and compliant deployment pipelines using IAM roles
VPC
and secrets management with AWS Secrets Manager or SSM Parameter Store.
Champion DevOps best practices across the ML lifecycle
including canary deployments
rollback strategies
and audit logging for model changes.
hands-on experience in MLOps deploying ML applications in production at scale.
Proficient in AWS services: SageMaker
Lambda
CodePipeline
ECR
ECS/Fargate
and CloudWatch.
Strong experience with Databricks workflows and Databricks Model Serving
including MLflow for model tracking
packaging
Proficient in Python and shell scripting with the ability to containerize applications using Docker.
Deep understanding of CI/CD principles for ML
including testing ML pipelines
data validation
and model quality gates.
Hands-on experience orchestrating ML workflows using Airflow (open-source or MWAA) or Databricks Workflows.
Familiarity with model monitoring and logging stacks (e.g.
ELK
Datadog
or OpenTelemetry).
Experience deploying models as REST endpoints
batch jobs
and asynchronous workflows.Version control expertise with Git/GitHub and experience in automated deployment reviews and rollback strategies.

Job Details

Position Title: ML/OPS Engineer - IV JPC - 6768
Location: Miramar or Dallas Required Experience: 12 years
Tax Terms: C2C, W2

Required/ Desired Skills:

Hi Team,
Need your support for the below roles.

Role: ML/OPS Engineer - IV

Location: Miramar or Dallas

Duration: Long Term

Must Have:
Experience with databricks, creating ML ops pipeline, AWS core pipeline
Must have at least 4 years of hands on experience
Experience with models for data scientists to use (do not expect candidates to be data scientists/engineers)

Description:

Building, scaling, automating and orchestrating model pipelines; Experience in specific tech stacks include: MLFlow, AutoML, MosaicML, Seldon, Airflow, Docker, Kubernetes, Helm or similar, AWS Sagemaker, Databricks, Grafana or similar, Tecton or similar, CUDA or similar
MLOps Engineer (AWS & Databricks)

Primary Responsibilities
Design, implement, and maintain CI/CD pipelines for machine learning applications using AWS CodePipeline, CodeCommit, and CodeBuild.
Automate the deployment of ML models into production using Amazon SageMaker, Databricks, and MLflow for model versioning, tracking, and lifecycle management.
Develop, test, and deploy AWS Lambda functions for triggering model workflows, automating pre/post-processing, and integrating with other AWS services.
Maintain and monitor Databricks model serving endpoints, ensuring scalable and low-latency inference workloads.
Use Airflow (MWAA) or Databricks Workflows to orchestrate complex, multi-stage ML pipelines, including data ingestion, model training, evaluation, and deployment.
Collaborate with Data Scientists and ML Engineers to productionize models and convert notebooks into reproducible and version-controlled ML pipelines.
Integrate and automate model monitoring (drift detection, performance logging) and alerting mechanisms using tools like CloudWatch, Prometheus, or Datadog.
Optimize compute workloads by managing infrastructure-as-code (IaC) via CloudFormation or Terraform for reproducible, secure deployments across environments.
Ensure secure and compliant deployment pipelines using IAM roles, VPC, and secrets management with AWS Secrets Manager or SSM Parameter Store.
Champion DevOps best practices across the ML lifecycle, including canary deployments, rollback strategies, and audit logging for model changes.
Minimum Requirements
hands-on experience in MLOps deploying ML applications in production at scale.
Proficient in AWS services: SageMaker, Lambda, CodePipeline, CodeCommit, ECR, ECS/Fargate, and CloudWatch.
Strong experience with Databricks workflows and Databricks Model Serving, including MLflow for model tracking, packaging, and deployment.
Proficient in Python and shell scripting with the ability to containerize applications using Docker.
Deep understanding of CI/CD principles for ML, including testing ML pipelines, data validation, and model quality gates.
Hands-on experience orchestrating ML workflows using Airflow (open-source or MWAA) or Databricks Workflows.
Familiarity with model monitoring and logging stacks (e.g., Prometheus, ELK, Datadog, or OpenTelemetry).
Experience deploying models as REST endpoints, batch jobs, and asynchronous workflows.
Version control expertise with Git/GitHub and experience in automated deployment reviews and rollback strategies.
Nice to Have
Experience with Feature Store (e.g., AWS SageMaker Feature Store, Feast).
Familiarity with Kubeflow, SageMaker Pipelines, or Vertex AI (if multi-cloud).
Exposure to LLM-based models, vector databases, or retrieval-augmented generation (RAG) pipelines.
Knowledge of Terraform or AWS CDK for infrastructure automation.
Experience with A/B testing or shadow deployments for ML models.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Nanda Technologies