Lead Data Science Consultant (MLOps & API Performance Analytics) (W2 or Independent Candidates)

Denver, CO, US • Posted 12 hours ago • Updated 12 hours ago

Contract W2

No Travel Required

Able to Sponsor

On-site

Depends on Experience

Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

"Data Scientist"
MLOps
API Performance
Analytics

Summary

Role: Lead Data Science Consultant (MLOps & API Performance Analytics)

Location: Denver, CO (Onsite)

Duration: Long-term

Note: Need 10+ years of experienced candidates.

About the Role:

We’re seeking a Senior Data Science Consultant with deep experience in Data Science, DevOps/MLOps, and Data Visualization, with an added focus on API performance tracking, analytics, troubleshooting, predictive reliability, and pattern identification.

In this senior role, you will:

Architect and deliver scalable, production‑grade data and ML solutions
Lead cross‑functional initiatives to improve system and API reliability
Build predictive models that forecast failures before they occur
Guide teams through complex troubleshooting and performance optimization
Influence technical strategy and engineering standards across the organization
You will partner with backend engineering, SRE, data engineering, and product teams to deliver high‑impact, data‑driven improvements to stability and performance.

Key Responsibilities:

Data Science & Analytics:

Lead the design and execution of complex analytical frameworks to detect patterns, anomalies, and failure precursors.
Conduct advanced EDA to uncover multi‑layer correlations across product, operational, and infrastructure datasets.
Apply predictive modeling and machine learning to identify where system or API issues are most likely to occur.
Use statistical process control and drift detection techniques to ensure ongoing operational stability.
Build simulation and forecasting models to evaluate the impact of load changes, upgrades, or new features on system behavior.
Establish and enforce best practices for reproducible research, model validation, and experimentation.

API & System Performance Analytics:

Architect end‑to‑end observability solutions to track API latency, throughput, error rates, saturation, and SLO adherence.
Build automated pipelines that ingest, aggregate, and model API telemetry logs and traces (OpenTelemetry, Prometheus, CloudWatch, Application Insights, etc.).
Detect and explain leading indicators of API instability using anomaly detection, time‑series forecasting, and multivariate correlation.
Provide engineering with “risk heatmaps” to identify high‑risk services, endpoints, or infrastructure components.
Predictive Reliability & Proactive Mitigation (Senior-Level Additions)
Design and implement predictive models that forecast outages, SLA breaches, or performance regressions.
Develop automated early‑warning systems integrated into observability platforms.
Architect proactive mitigation workflows:
Adaptive scaling rules
Automated rollback/canary strategies
Circuit breakers and fault‑tolerance improvements
Predictive alerting thresholds

DevOps / MLOps:

Architect and optimize CI/CD workflows for model deployment and data pipelines.
Develop and maintain Docker/Kubernetes‑based services for training, inference, and analytics.
Implement observability frameworks for ML workloads, ensuring traceability, logging, and performance monitoring.
Maintain model registries, drift detection systems, and automated retraining strategies.
Use IaC (Terraform/Bicep/CloudFormation) to maintain secure, reproducible environments.

Data Engineering:

Design and optimize scalable ETL/ELT pipelines across batch and streaming architectures.
Develop transformations, semantic layers, and feature stores supporting both predictive analytics and operational monitoring.
Integrate API event logs, telemetry, and performance metrics into high‑quality analytics datasets.
Establish data quality SLAs and automated validation processes.

Data Visualization & Decision Support:

Build executive‑quality dashboards that communicate API health, KPIs, predictive signals, and operational trends.
Create advanced visualizations: forecast bands, anomaly indicators, latency distributions, saturation patterns, and future‑state projections.
Standardize visualization frameworks, semantic metrics, and documentation across teams.
Influence decision‑making by translating predictive findings into clear, concise recommendations.

Collaboration, Leadership & Technical Ownership:

Serve as a technical leader across engineering, driving standards for reliability, observability, and data‑driven decision‑making.
Mentor engineers and data scientists, conducting code reviews, design reviews, and knowledge‑sharing sessions.
Lead post‑incident reviews and guide teams in building lasting solutions, not short‑term patches.
Partner with product and engineering leadership to define roadmaps, set metrics, and prioritize improvements.
Communicate complex technical topics to executives with clarity and measurable impact.
Champion a culture of quality, automation, performance excellence, and continuous improvement.

Qualifications:

Required

Senior‑level proficiency in Python, SQL, and software engineering best practices (testing, design patterns, modular architecture).
Extensive experience with observability data: logs, metrics, traces, service topology, and distributed systems behavior.
Hands‑on experience with API performance tools (Grafana, Prometheus, Datadog, New Relic, Splunk, Azure Monitor, CloudWatch, etc.).
Strong understanding of SLOs, SLIs, latency percentiles, error budgets, traffic analysis, and capacity planning.
Deep experience with CI/CD pipelines, Git‑based workflows, and automated deployments.
Strong skills in Docker/Kubernetes and cloud-native microservice environments.
Expertise in data visualization tools (Power BI, Tableau, Looker) and Python visualization libraries.
Experience with time-series modeling, anomaly detection, and forecasting (ARIMA, Prophet, Holt‑Winters, LSTM, etc.).
Proven ability to troubleshoot complex, distributed system issues and drive long‑term resolutions.
Demonstrated ability to own systems end‑to‑end through design, implementation, deployment, and maintenance.

Preferred:

Experience in system or API predictive modeling (e.g., Monte Carlo, reliability models).
Experience building risk scoring systems for performance, stability, or reliability.
Familiarity with distributed tracing tools (OpenTelemetry, Jaeger, Zipkin).
Experience with SRE practices and incident‑response engineering.
Experience with dbt, Airflow, Dagster, or Prefect for orchestration.
Experience with MLflow, Databricks, SageMaker, Azure ML, or similar MLOps platforms.
Ability to design automated mitigation strategies (predictive alerts, auto-scaling, failure‑prevention policies).
Experience influencing cross‑team architecture decisions in large, complex systems.

Success Metrics:

Reduction in API incidents and faster MTTD/MTTR.
Improved system reliability: lower error rates, improved latency, higher throughput.
Successful prediction and prevention of performance degradations before they occur.
High adoption of dashboards, predictive models, and observability tools.
Significant improvements in deployment velocity, ML reliability, and platform stability.
Positive cross-team feedback on leadership, mentorship, and collaboration.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91172920
Position Id: 8862173
Posted 12 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Lead Data Science Consultant (MLOps & API Performance Analytics) (W2 or Independent Candidates)

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs