Overview
Skills
Job Details
MLOps Tech Lead
We are seeking a MLOps Tech Lead to support and evolve our Machine Learning Platform. This role is ideal for someone who thrives in ambiguous problem spaces and enjoys building scalable infrastructure and tools that empower Data Scientists and ML Engineers.
As a Tech Lead, you will be responsible for updating core Python SDKs, overseeing infrastructure rollouts, building and deploying batch pipelines, and serving as the primary point of contact for platform users. You ll bring a strong blend of Python development expertise, Google Cloud Platform cloud infrastructure experience, and stakeholder management skills to ensure our ML platform remains reliable, scalable, and cost-efficient.
Technical Leadership
- Proven experience leading MLOps initiatives across the full ML lifecycle
- Deep understanding of CI/CD pipelines, model deployment, monitoring, and scaling
- Hands-on expertise with cloud platform Google Cloud Platform and containerization (Docker, Kubernetes)
Project Ownership
- Ability to define and drive project plans from inception to delivery
- Skilled in managing execution timelines, resource allocation, and risk mitigation
- Comfortable working with cross-functional teams including data scientists, engineers, and product managers
Communication & Collaboration
- Strong communicator who can translate technical concepts to non-technical stakeholders
- Experience in stakeholder management and status reporting
- Capable of leading meetings, resolving conflicts, and aligning teams toward shared goals
Problem Solving in Ambiguity
- Thrives in undefined or evolving problem spaces
- Demonstrates creativity and initiative in shaping solutions from vague requirements
- Comfortable making decisions with incomplete information and iterating quickly
Strategic Thinking
- Ability to align MLOps practices with business goals
- Experience in evaluating and implementing tools and frameworks that improve team productivity and model reliability
- Develop and enhance Python frameworks and libraries for cost tracking, data processing, data quality, lineage, governance, and MLOps.
- Optimize data processing workflows to reduce costs across large-scale training data and feature pipelines.
- Build scalable batch pipelines for features and training data using BigQuery, Dataflow, and Composer on Google Cloud Platform (Google Cloud Platform).
- Implement robust monitoring, logging, and alerting systems to ensure infrastructure reliability and pipeline stability.
- Plan and execute infrastructure rollouts with phased deployments, validation, and rollback strategies.
- Serve as the primary point of contact for Data Scientists, ML Engineers, and other stakeholders, managing communications and issue resolution.
- Collaborate with ML Platform engineers to ensure seamless integration of updates into existing workflows.
- Document processes and changes, creating clear runbooks and handoff materials for ongoing support.
Preferred Qualifications
- 5+ years in MLOps or related roles
- Prior experience as a tech lead or architect
- Familiarity with ML governance, compliance, and data privacy best practices
MLOps, Tech Lead, Python, SDK, Google Cloud Platform, Google Cloud Platform, BigQuery, Dataflow, Composer, ML Platform, Machine Learning, Data Engineering, Infrastructure Rollout, Monitoring, Logging, Alerting, CI/CD, ML Governance, Stakeholder Management, Cloud Functions, IAM, MLflow, TFX, Kubeflow, Batch Pipelines, Cost Optimization, Data Quality, Data Lineage