Lead Data Science Engineer, Machine Learning Platform
San Mateo or San Diego, CA
We seek a Lead Platform Engineer to support our cloud based machine learning and data science platform. In this role you will lead a team to deliver tools hosted on AWS and based on EMR Permanent and Transitory Clusters, S3 storage and in house software and open source tools. This is a leadership role within our Data Science team that will empower our global teams to quickly leverage advanced Machine Learning for a variety of problems. If this is you, please apply!
- Lead as a line manager to two engineers on the Data Science Platform team
- Set the technical direction for the Data Science Platform team in collaboration with the Product Owner and with the Director of Data Sciences
- Collaborate globally with data and cloud engineers to build a Machine Learning AWS-based platform.
- Collaborate with data scientists to make sure new platform meets requirements and conforms to best practice.
- Perform complex application programming activities, coding, testing, implementation and documentation of solution.
- Troubleshoot and debug services in all stages of the development cycles, from development to production.
- Document new and existing projects to improve community understanding and contribution.
- Experience with leadership and line management.
- Strong experience in designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (EMR Clusters, S3, ELBs, EC2, EBS).
- Strong working knowledge of deploying and configuring Apache Spark clusters, ideally on EMR clusters.
- Strong proficiency in Python.
- Detailed knowledge/understanding of more than one version control system, including git.
- Knowledge of large open source projects and how they operate preferably Airflow.
- Working knowledge of unix-like environments; shell scripting and system level knowledge.
- Practical exposure to Continuous Integration/Continuous Delivery tools like Jenkins to merge development with testing through pipelines.
- A desire to set the technical direction for the platform and mentor and develop engineers in the team
- Big-Data Cloud Scalability.
- Hive metastore and Hadoop.
- JDBC/ODBC, SQL query processing, and distributed query engines.
- Configuration Management tools like Ansible and Terraform.
- Docker container infrastructure.
- Monitoring and logging tools like Splunk.
- Jupyterhub deployment and Apache Livy integration.
- Visualization tools such as Tableau.