Role: ML Engineer - AI Operations
Location: Morristown, New Jersey- 100% onsite from Day 1.
Duration: 6+ months
Experience Required: 8- 10 Years.
Skills: Digital : Artificial Intelligence(AI), Digital : Google Data Engineering~
"Key responsibilities:
Own and operate CI/CD for existing ML services across dev/test/prod; standardize blue/green and canary releases with automated rollbacks.
Run model/data drift and performance monitoring with SLAs; define alerts, thresholds, and retraining triggers.
Build and maintain production dashboards, alerts, and incident workflows; codify on-call runbooks and escalation paths.
Partner with onshore model owners to diagnose metric degradation and land mitigations aligned to governance and controls.
Provide day-to-day L2/L3 support for production ML: triage, root-cause analysis, permanent fixes, and post-incident reviews.
Own operational documentation: runbooks, standard operating procedures, and recurring health checks.
Coordinate hotfixes and safe rollbacks with onshore teams; verify recovery via automated smoke tests.
Harden and productionize research notebooks into maintainable, testable services with CI, unit/integration tests, and linting.
Operate and evolve model-serving APIs and batch scoring jobs; integrate with enterprise schedulers and data platforms.
Ensure models are fully integrated into CI/CD, observability, and monitoring stacks; enforce traceability with experiment and model registries.
Validate successful delivery of model outputs to apps, chatbots, reports, and downstream systems with contract tests and data quality checks.
Required Skills:
Git/GitLab, Python, SQL, MLflow, Power BI, Snowflake
OLAP/OLTP data modeling and architecture
API frameworks (FastAPI/Flask), and
Nice to have:
Modern ELT tools (Fivetran/Airbyte)
Streaming/real-time data pipelines (e.g., Kafka, Kinesis, Redpanda)
Production ML service operations experience (experience in broader full-stack