Overview
Skills
Job Details
AI Sustaining Engineer
Long term - Contract to hire (CTH)
100% Remote
4 positions
Project summary
Client is building out an all-new AI team to support a multitude of initiatives. They are building out a vast roadmap through 2029 outlook, including use cases involving Prior Authorization, Rebates, Summarization.
The AI Sustainability Engineer ensures that AI models deployed in production remain accurate, efficient, ethical, and cost-effective over time. This role bridges MLOps, observability, and optimization, focusing on performance monitoring, drift detection, retraining workflows, and sustainable resource use.
Key Responsibilities:
- Monitor AI model performance, reliability, and fairness in production.
- Detect and remediate data drift, bias, and degradation issues.
- Optimize model inference efficiency, scalability, and energy usage.
- Implement observability frameworks and automated retraining triggers.
- Collaborate with AI Engineers and infrastructure teams to sustain production health.
- Provide technical support and troubleshooting for live AI systems.
- Report AI policy violations and ensure compliance.
Must have Skills
- Experience with ML observability tools (Dynatrace, MLflow, EvidentlyAI, Prometheus, Grafana, etc.).
- Strong understanding of data drift detection and statistical monitoring.
- Hands-on with containerization (Docker/Kubernetes) and CI/CD pipelines.
- Experience in production support and AI monitoring.
- Proficient in Python for automation and monitoring scripts.
- Familiarity with model versioning, governance, and retraining pipelines.
- Knowledge of AWS cloud AI infrastructure (containerized deployments on EC2 instances).
- Strong communication, team player, go-getter type attitude. Must be proactive and seeking out work and solutions, good working in ambiguity and not always being handed instructions.
Responsibilities:
- Provide technical support and troubleshooting for live AI systems.
- Monitor performance, usage, and drift.
- Report AI policy violations and ensure compliance.
- Measure ROI and performance post-implementation
Qualifications:
- Experience in production support and AI monitoring.
- Familiarity with Dynatrace, Optura EMA, and AI governance.
- Strong metrics and reporting capabilities.
Top Technical Needs
- Production Support Experience
- Ability to troubleshoot and resolve issues in live AI systems.
- Familiarity with incident management and root cause analysis.
- Monitoring Tools Expertise
- Hands-on experience with Dynatrace and Optura EMA for performance and usage monitoring.
- Understanding of system drift and anomaly detection.
- Metrics & Reporting
- Strong skills in tracking KPIs, ROI, and performance metrics.
- Ability to generate actionable insights from monitoring data.
Operational Needs
- AI System Performance Monitoring
- Regular tracking of model accuracy, latency, and resource usage.
- Identifying degradation or drift in AI models over time.
- Post-Implementation Analysis
- Measuring business impact and ROI of deployed AI solutions.
- Communicating findings to stakeholders effectively.
Compliance & Governance Needs
- AI Governance Knowledge
- Familiarity with AI policy frameworks and ethical guidelines.
- Experience reporting violations and ensuring regulatory compliance.
- Risk Management
- Understanding of data privacy, bias detection, and model explainability.
- Ability to escalate and mitigate risks in AI deployments.