Software Engineer, Model Scaling, Autopilot AI

  • Palo Alto, CA
  • Posted 60+ days ago | Updated 9 hours ago

Overview

On Site
USD 140,000.00 per year
Full Time

Skills

FOCUS
Scalability
Neural Network
Data Loading
Collaboration
Evaluation
Dashboard
Computer Hardware
Dojo
High Performance Computing
Python
Software Engineering
Code Optimization
Deep Learning
PyTorch
TensorFlow
Training
Data Manipulation
Jupyter
NumPy
matplotlib
scikit-learn
Data Processing
Optimization
CPU
GPU
Debugging
Management
Parallel Computing
Storage
Workflow
Computer Networking
InfiniBand
Remote Direct Memory Access
Machine Learning (ML)
Artificial Intelligence
Research
PPO
Payroll
Health Care
FSA
Finance
Apache Flex
Legal
Insurance

Job Details

As a Software Engineer on Tesla's Autopilot AI team, you will play a crucial role in optimizing and scaling our neural network training infrastructure. You will join a specialized team of machine learning experts and have access to one of the world's largest model training clusters. Your primary focus will be to design, implement, and maintain high-performance applications for neural network training, evaluation, and data processing pipelines. Additionally, you will build supporting applications for profiling and debugging, and work on optimizing training and evaluation code to maximize efficiency and minimize resource usage.

Responsibilities
  • Design and Implement Large-Scale Data Pipelines: Build and maintain robust data processing pipelines that handle petabytes of autonomous vehicle data, including images, videos, and auto-generated labels, ensuring scalability and reliability
  • Optimize Neural Network Training Processes: Support neural network training by optimizing code and data formats for faster data loading, orchestrating auto-labeling jobs, and debugging bottlenecks to enhance overall training efficiency
  • Enhance System Performance: Develop and implement automation, monitoring, and optimization tools to improve the efficiency of system performance, including resource utilization, parallelism, and data I/O
  • Collaborate with Machine Learning Researchers: Work closely with researchers to understand and execute their data and infrastructure requirements, providing solutions that facilitate rapid experimentation and production-scale model deployment
  • Develop Evaluation Tools and Dashboards: Create and maintain evaluation metrics, tools, visualizations, and dashboards to support the development and refinement of neural networks
  • Implement Low-Level Integrations: Write efficient, low-level code that integrates with high-level training frameworks to enhance performance across various hardware platforms, including Dojo, Tesla's supercomputer
  • Stay Updated with ML Advancements: Keep abreast of the latest advancements and technologies in machine learning engineering to continually improve Tesla's AI infrastructure

Requirements
  • Strong Software Engineering Skills: Extensive experience with Python and software engineering best practices, including code optimization and system-level programming
  • Experience with Deep Learning Frameworks: Proficiency in one or more deep learning frameworks, such as PyTorch or TensorFlow, with hands-on experience in optimizing model training processes
  • Data Manipulation and Analysis Expertise: Proficiency with data manipulation tools, including Jupyter notebooks, numpy, scipy, matplotlib, and scikit-learn, and experience handling large-scale data processing
  • System Optimization and Debugging: Demonstrated experience in profiling and optimizing CPU/GPU code and debugging complex system-level software to ensure high performance and reliability
  • Distributed Systems Experience: Proven track record of building and managing large-scale distributed systems, particularly in AI/ML workflows, with a deep understanding of parallel computing, resource utilization, and data handling
  • Knowledge of Storage and Data Formats: Strong understanding of underlying storage mechanisms and experience designing and optimizing data formats for machine learning workflows
  • Familiarity with High-Performance Networking: Experience with high-performance networking technologies, such as Infiniband, RDMA, and NCCL, is a plus
  • Passion for AI and Machine Learning: A deep understanding of machine learning concepts and a passion for staying current with the latest advancements in AI research and engineering

Compensation and Benefits
Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:
  • Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
  • Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions), and Paid Holidays
  • Back-up childcare and parenting support resources
  • Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
  • Weight Loss and Tobacco Cessation Programs
  • Tesla Babies program
  • Commuter benefits
  • Employee discounts and perks program
    • Expected Compensation

      $140,000 - $360,000/annual salary + cash and stock awards + benefits

      Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.