Overview
Skills
Job Details
We are seeking a Senior Data Infrastructure Engineer with deep expertise in Python, Google Cloud Platform (Google Cloud Platform), and MLOps to join our growing data engineering team. You will be responsible for building scalable data pipelines, optimizing cost and performance, and supporting machine learning workflows across our organization.
Key Responsibilities
Technical Leadership
- Proven experience leading MLOps initiatives across the full ML lifecycle
- Deep understanding of CI/CD pipelines, model deployment, monitoring, and scaling
- Hands-on expertise with cloud platforms (AWS, Google Cloud Platform, Azure) and containerization (Docker, Kubernetes)
Project Ownership
- Ability to define and drive project plans from inception to delivery
- Skilled in managing execution timelines, resource allocation, and risk mitigation
- Comfortable working with cross-functional teams including data scientists, engineers, and product managers
Communication & Collaboration
- Strong communicator who can translate technical concepts to non-technical stakeholders
- Experience in stakeholder management and status reporting
- Capable of leading meetings, resolving conflicts, and aligning teams toward shared goals
Problem Solving in Ambiguity
- Thrives in undefined or evolving problem spaces
- Demonstrates creativity and initiative in shaping solutions from vague requirements
- Comfortable making decisions with incomplete information and iterating quickly
Strategic Thinking
- Ability to align MLOps practices with business goals
- Experience in evaluating and implementing tools and frameworks that improve team productivity and model reliability
- Develop and enhance Python frameworks and libraries for:
- Cost tracking
- Data processing
- Data quality
- Lineage
- Governance
- MLOps
- Optimize data processing workflows to reduce costs for large-scale training data and feature pipelines.
- Build scalable batch pipelines using BigQuery, Dataflow, and Composer on Google Cloud Platform (Google Cloud Platform).
- Implement robust monitoring, logging, and alerting systems to ensure infrastructure reliability.
- Plan and execute infrastructure rollouts with phased deployments, validation, and rollback strategies.
- Serve as the primary liaison for Data Scientists, ML Engineers, and other stakeholders during rollout coordination and issue resolution.
- Collaborate with ML Platform Engineers to ensure seamless integration of updates.
- Document processes and changes, creating clear runbooks and handoff materials for ongoing support.
Required Skills & Qualifications
- Strong proficiency in Python and experience with building reusable libraries and frameworks.
- Hands-on experience with Google Cloud Platform (Google Cloud Platform) services: BigQuery, Dataflow, Composer, Cloud Monitoring, etc.
- Solid understanding of MLOps, data governance, and data lineage principles.
- Experience with CI/CD, infrastructure as code, and DevOps practices.
- Excellent communication and collaboration skills.
- Proven ability to manage infrastructure rollouts and support cross-functional teams.
Preferred Qualifications
- Experience with Terraform, Kubernetes, or Airflow.
- Familiarity with machine learning workflows and feature engineering pipelines.
- Background in data quality frameworks and cost optimization strategies.
Python, Google Cloud Platform, Google Cloud Platform, BigQuery, Dataflow, Composer, MLOps, Data Engineering, Machine Learning, Data Pipelines, Infrastructure, Monitoring, Logging, Alerting, Cost Optimization, Data Governance, Data Lineage, CI/CD, DevOps, Airflow, Terraform, Kubernetes, ML Platform, Feature Engineering, Batch Pipelines, Cloud Infrastructure, Rollout Strategy, Runbooks