Lead Data Science Engineer, Machine Learning Platform

Apache Spark, Leadership, AWS EMR (Elastic MapReduce), Python Programming
Full Time
Depends on Experience
Work from home not available Travel required to 10%.

Job Description

Lead Data Science Engineer, Machine Learning Platform  

San Mateo or San Diego, CA


We seek a Lead Platform Engineer to support our cloud based machine learning and data science platform. In this role you will lead a team to deliver tools hosted on AWS and based on EMR Permanent and Transitory Clusters, S3 storage and in house software and open source tools. This is a leadership role within our Data Science team that will empower our global teams to quickly leverage advanced Machine Learning for a variety of problems. If this is you, please apply!



  • Lead as a line manager to two engineers on the Data Science Platform team
  • Set the technical direction for the Data Science Platform team in collaboration with the Product Owner and with the Director of Data Sciences
  • Collaborate globally with data and cloud engineers to build a Machine Learning AWS-based platform.
  • Collaborate with data scientists to make sure new platform meets requirements and conforms to best practice.
  • Perform complex application programming activities, coding, testing, implementation and documentation of solution.
  • Troubleshoot and debug services in all stages of the development cycles, from development to production.
  • Document new and existing projects to improve community understanding and contribution.



  • Experience with leadership and line management.
  • Strong experience in designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (EMR Clusters, S3, ELBs, EC2, EBS).
  • Strong working knowledge of deploying and configuring Apache Spark clusters, ideally on EMR clusters.
  • Strong proficiency in Python.
  • Detailed knowledge/understanding of more than one version control system, including git.
  • Knowledge of large open source projects and how they operate preferably Airflow.
  • Working knowledge of unix-like environments; shell scripting and system level knowledge.
  • Practical exposure to Continuous Integration/Continuous Delivery tools like Jenkins to merge development with testing through pipelines.
  • A desire to set the technical direction for the platform and mentor and develop engineers in the team



  • Big-Data Cloud Scalability.
  • Hive metastore and Hadoop.
  • JDBC/ODBC, SQL query processing, and distributed query engines.
  • Configuration Management tools like Ansible and Terraform.
  • Docker container infrastructure.
  • Monitoring and logging tools like Splunk.
  • Jupyterhub deployment and Apache Livy integration.
  • Visualization tools such as Tableau.
Dice Id : scea
Position Id : 6064374
Have a Job? Post it

Similar Positions

Data Scientist
  • CyberCoders
  • San Francisco, CA
AI/Data Scientist
  • Cognate Inc
  • Milpitas, California
Data Scientist
  • Decision Minds
  • Palo Alto, CA
Algorithm Research Engineer
  • Payette Group
  • Fremont, CA
Senior Data Scientist
  • Act-1 Group
  • San Francisco, CA
Data Scientist
  • Confidential Company
  • San Francisco, CA
Data Engineer
  • Fluxtek
  • Milpitas, CA
AI/Data Scientist
  • Maven Workforce
  • Milpitas, CA
Data Scientist / Analyst
  • GCS
  • Santa Clara, CA
Data Scientist Statistical Modeling *** Direct client ***
  • Projas Technologies, LLC
  • Santa Clara, CA