ML Ops Support Engineer-Hybrid

Overview

Hybrid
Depends on Experience
Contract - W2

Skills

("MACHINE LEARNING" OR MLOPS OR "ML PIPELINES" OR "ML WORKFLOWS") AND (SUPPORT OR TROUBLESHOOTING OR MONITORING OR LOGGING)

Job Details

We have Contract role ML Ops Support Engineer-Hybrid for our client at Reading PA. Please let me know if you or any of your friends would be interested in this position.

Position Details:

ML Ops Support Engineer-Hybrid-Reading PA

Location : Reading, PA 19607 (Hybrid)

Project Duration : 8+ Months of contract

Job Description:

ML Ops L2 Support Engineer to provide 24/7 production support for machine learning (ML) and data pipelines. The role requires on-call support, including weekends, to ensure high availability and reliability of ML workflows. The candidate will work with Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.

Key Responsibilities:

Incident Management & Support:

  • Provide L2 support for ML Ops production environments, ensuring uptime and reliability.
  • Troubleshoot ML pipelines, data processing jobs, and API issues.
  • Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such Cloud Watch.
  • Perform root cause analysis (RCA) and resolve incidents within SLAs.
  • Escalate unresolved issues to L3 engineering teams when needed.

Dataiku Platform Management:

  • Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.
  • Monitor and support Dataiku plugins, APIs, and automation scenarios.
  • Collaborate with Data Scientists and Data Engineers to debug ML model deployments.
  • Perform version control and CI/CD integration for Dataiku projects.

Deployment & Automation:

  • Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc).
  • Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow.
  • Automate monitoring and alerting for ML model drift, data quality, and performance.

Cloud & Infrastructure Support:

  • Monitor AWS-based ML workloads (Sage Maker, Lambda, ECS, S3, RDS).
  • Manage storage and compute resources for ML workflows.
  • Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka).

Security & Compliance:

  • Ensure secure access control for ML models and data pipelines.
  • Support audit, compliance, and governance for Dataiku and ML Ops workflows.
  • Respond to security incidents related to ML models and data access.

Required Skills & Experience:

  • Experience: 5+ years in ML Ops, Data Engineering, or Production Support.
  • Dataiku DSS: Strong experience in Dataiku workflows, scenarios, plugins, and APIs.
  • Cloud Platforms: Hands-on experience with AWS ML services (Sage Maker, Lambda, S3, RDS, ECS, IAM).
  • CI/CD & Automation: Familiarity with GitHub Actions, Jenkins, or Terraform.
  • Scripting & Debugging: Proficiency in Python, Bash, SQL for automation & debugging.
  • Monitoring & Logging: Experience with Prometheus, Grafana, Cloud Watch, or ELK Stack.
  • Incident Response: Ability to handle on-call support, weekend shifts, and SLA-based issue resolution.

Preferred Qualifications:

  • Containerization: Experience with Docker, Kubernetes, or Open Shift.
  • ML Model Deployment: Familiarity with Tensor Flow Serving, ML flow, or Dataiku Model API.
  • Data Engineering: Experience with Spark, Data bricks, Kafka, or Snowflake.
  • ITIL/DevOps Certifications: ITIL Foundation, AWS ML certifications; Dataiku certification

Work Schedule & On-Call Requirements:

  • Rotational on-call support (including weekends and nights).
  • Shift-based monitoring for ML workflows and Dataiku jobs.
  • Flexible work schedule to handle production incidents and critical ML model failures.

Process Flows

  • Mentor and Knowledge transfer to client project team members
  • Participate as primary, co and/or contributing author on any and all project deliverables associated with their assigned areas of responsibility
  • Participate in data conversion and data maintenance
  • Provide best practice and industry specific solutions
  • Advise on and provide alternative (out of the box) solutions
  • Provide thought leadership as well as hands on technical configuration/development as needed.
  • Participate as a team member of the functional team
  • Perform other duties as assigned.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.