Data Engineer

Los Altos, CA, US • Posted 6 hours ago • Updated 6 hours ago

Full Time

On-site

$70 - $80/hr

Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

Algorithms
Amazon S3
Amazon SageMaker
Amazon Web Services
Artificial Intelligence
Amazon EC2
Machine Learning (ML)
Modeling
Data Engineering
Data Quality
Data Storage
Continuous Delivery
Continuous Integration
Robotics
Global Positioning System
GPS
LiDAR

Summary

We re on a mission to improve the quality of human life. We re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we ve built a world-class team in Energy & Materials, Human-Centered AI, Human Interactive Driving, Large Behavioral Models, and Robotics.
Within the Human Interactive Driving division, the Extreme Performance Intelligent Control department is working to develop scalable, human-like driving intelligence by learning from expert human drivers. This project focuses on creating a configurable, data-driven world model that serves as a foundation for intelligent, multi-agent reasoning in dynamic driving environments. By tightly integrating advances in
perception, world modeling, and model-based reinforcement learning, we aim to overcome the limitations of more compartmentalized, rule-based approaches. The end goal is to enable robust, adaptable, and interpretable driving policies that generalize across tasks, sensor modalities, and public road scenarios delivering transformative improvements for ADAS, autonomous systems, and simulation-driven software
development.
As a Data Engineer, you will be a key enabler of this mission owning the systems that collect, organize, clean, and deliver the volumes of sensor and simulation data that fuel our world models, perception systems, and reinforcement learning algorithms. You will collaborate closely with research scientists and machine learning engineers to ensure our pipelines are reliable, scalable, and performant powering breakthroughs in intelligent driving across simulation and real-world deployments.
Responsibilities

Design, implement, and maintain robust data pipelines for ingesting, cleaning, and transforming large-scale autonomous vehicle datasets (camera, LiDAR, radar, GPS, simulation logs).
Develop scalable storage and retrieval systems using AWS services (S3, EC2, SageMaker, Athena, etc.).
Ensure data quality and consistency through automated validation, deduplication, and schema enforcement.
Collaborate with ML researchers and engineers to provide efficient access to training data, labels, and metadata.
Optimize data preprocessing and batching pipelines to support large-scale training and evaluation workflows.
Build tools to manage and audit dataset versions, experiment tracking, and feature reproducibility.
Implement and maintain CI/CD workflows for data and pipeline updates, ensuring minimal downtime and reproducible outputs.
Monitor data pipeline performance and respond to bottlenecks or outages proactively.

Qualifications
B.S. or M.S. in Computer Science, Data Engineering, or a related field.
3+ years of experience building production-grade data infrastructure or ML data pipelines.
Strong proficiency with Python and SQL, and experience with data workflow orchestration tools (e.g., Airflow, Prefect, Luigi).
Deep experience with AWS services, especially S3 (data storage), EC2 (compute), and SageMaker (model training).
Familiarity with distributed computing frameworks like Spark, Dask, or Ray.
Understanding of best practices for dataset documentation, standardization, and reproducibility in research.

Bonus Qualifications
Experience with autonomous vehicle datasets or robotics sensor data.
Familiarity with ML training pipelines and model evaluation workflows.
Prior experience collaborating with researchers or applied ML teams in high-throughput environments.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10117123
Position Id: 8868089
Posted 6 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Data Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs