Overview
Skills
Job Details
Role : Python Developer with Data Engineer
Location : Mountain View CA (5 days working)
Experience : 8+
Must have :
Python frameworks and libraries to support data processing and integration tasks.
Hands-on experience with Apache Airflow, including updating DAGs, managing multi-node rollouts, and troubleshooting issues.
Use Git and GitHub for source control, code reviews, and version management.
Knowledge of resource labeling and automation through SDKs or APIs.
Extensive experience working with Google Cloud Platform services (e.g., Big Query, Cloud Dataflow, Pub/Sub, Cloud Storage, Monitoring) . Big query, Dataflow , Batch Pipelines
Minimum Basic Requirements
Python Proficiency: Write, update, and maintain Python frameworks and libraries to support data processing and integration tasks.
Composer / Apache Airflow: Hands-on experience with Apache Airflow, including updating DAGs, managing multi-node rollouts, and troubleshooting issues
Code Management: Use Git and GitHub for source control, code reviews, and version management.
Google Cloud Platform Proficiency: Extensive experience working with Google Cloud Platform services (e.g., Big Query, Cloud Dataflow, Pub/Sub, Cloud Storage, Monitoring). Knowledge of resource labeling and automation through SDKs or APIs.
Software Engineering: Strong understanding of software engineering best practices, including version control (Git), collaborative development (GitHub), code reviews, and CI/CD.
Problem-Solving: Excellent problem-solving skills with the ability to tackle complex data engineering challenges.
Communication: Excellent stakeholder communication skills, with the ability to interface directly with data scientists, platform engineers, and other clients to explain complex technical details, coordinate rollouts, triage issues, and provide updates
Bachelor s or master s degree in computer science, Engineering, Computer Information Systems, Mathematics, Physics, or a related field or software development training program
What you ll do
Develop and enhance Python frameworks and libraries to support cost tracking, data processing, data quality, lineage, governance, and MLOps.
Implement data processing optimizations to reduce the cost of our larger training data and features pipelines.
Build scalable features and training data batch pipelines leveraging Big query, Dataflow and Composer scheduler/executor framework on Google Cloud Platform.
Implement monitoring, logging, and alerting systems to ensure the reliability and stability of our data and ml infrastructure and pipelines.
Plan and oversee infrastructure rollouts, including phased deployments, validation, and rollback strategies.
Act as primary point of contact for Data Scientists, ML Engineers, and other stakeholders handling rollout coordination, communications, and issue resolution.
Collaborate with ML Platform engineers to ensure seamless integration of updates into workflows.
Document processes and changes, providing clear runbooks and handoffs for ongoing support.
Preferred Requirements
Python Mastery: Strong Python development background, with demonstrated experience maintaining and extending production-grade SDKs or internal libraries
Change Management: Experience with infrastructure rollout and change management, including phased deployments, validation, and rollback strategies
Resiliency: Comfortable working in fast-moving ML/AI platform environments, where reliability, transparency, and client experience are key
Batch Pipelines: Experience with building, deploying, and maintaining production batch pipelines processing and publishing petabytes of data.
Regards
LinkedIn :