Data Platform & Visualization Engineer

Los Altos, CA, US • Posted 8 hours ago • Updated 8 hours ago

Contract W2

Contract Independent

On-site

$70 - $80/hr

Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

API
Amazon EC2
Amazon S3
Amazon Web Services
Analytics
Cloud Computing
Dashboard
Data Visualization
Documentation
Database
IDLE
Linux
SQL
Python
React.js
UI
Visualization
Machine Learning (ML)

Summary

Job Description
Data Platform & Visualization Engineer (Contractor)
We are seeking a contractor to help build and evolve our internal data platform that supports vehicle testing, experimentation, and machine learning workflows.
This role focuses on implementing and extending data ingestion pipelines, automated processing workflows, metrics tracking systems, and web-based visualization tools under the guidance of the team.
You will work with existing systems and well-defined components, contributing features and improvements that are used directly by researchers.
What You ll Do
- Implement and extend data ingestion and processing workflows for large, heterogeneous datasets collected from vehicle tests and ML pipelines.
- Contribute to improving orchestration, scheduling, and reliability of long-running data workflows operating under real-world constraints.
- Integrate downstream automation such as metric computation, plotting, and LLM-based postprocess tooling.
- Implement backend services and APIs that support data indexing, metadata management, and experiment tracking.
- Build user-facing web-based tools and dashboard that allow users to browse datasets, inspect
results, and understand experimental progress over time.
- Work with a SQL-backed database to store metrics, experiment metadata, and summaries, ensuring the data can be queried and accessed consistently across systems.
- Contribute to data traceability and provenance mechanisms that capture how datasets are generated, transformed, and consumed in ML workflows.

What We re Looking For
- Experience with Python for backend services, data pipelines, and automation.
- Working knowledge of SQL, including writing queries and understanding database schemas.
- Experience building web-based tools, including:
- Backend APIs (e.g., FastAPI, Flask, or similar)
- Frontend applications using React or other modern frameworks
- Familiarity with AWS and cloud-based storage or services.
- Comfortable working in Linux environments

Bonus Points
- Interest in autonomous racing and vehicle dynamics research.
- Prior internship or project experience involving data pipelines, dashboards, or analytics tools.
- Exposure to data visualization libraries, ML workflows, or experiment tracking systems.

Statement of Work
1. Scope of Work
The Contractor will provide engineering services to support the development and extension of internal data platform tooling supporting vehicle testing, experimentation, and machine learning workflows.
The scope includes ownership and extension of existing systems, implementation of automated pipelines, development of web-based visualization tools, and delivery of data traceability mechanisms.
2. Key Responsibilities
2.1 Data Ingestion Platform (pokedex / evdc_ingest)
Own and extend an existing data ingestion system responsible for uploading vehicle test data to Amazon S3.
Improve ingestion orchestration to support:
Upload prioritization for small datasets
Deferred upload scheduling for large datasets during off-hours
Automatic discarding of data explicitly marked as trash
Persistent queueing and resumability across server restarts or failures
Maintain ingestion reliability under constrained network bandwidth.
Extend the current web interface for clarity, reliability and extendability

System-level architecture decisions will be guided by the team.
2.2 Post-Ingestion Automation, Annotation and Storage
Integrate ingestion workflows with post-processor, such as:
Existing LLM-based automatic annotation module
Automating plot generation (You come back to automatically generated plots as soon as data hits S3 - imagine that!)
Metric computation pipelines
Package and deploy the annotation system as a service (e.g., EC2-based).
Implement orchestration logic to trigger annotation jobs opportunistically when ingestion resources are idle.
Store metrics, experiment metadata, plots and summaries in SQL-backed database layer.

2.3 Metrics Platform & Leaderboards
Implement and extend a SQL-backed metrics database using schemas defined by the team.
Define schemas to support:
Multiple projects
Baselines vs experimental runs
Historical comparisons
Build automated pipelines to compute and register metrics after ingestion.
Implement project-level leaderboard functionality to track:
Best performance per metric
Accepted baselines vs rejected experiments
Develop a web-based visualization interface to:
Display time-series progress
Visualize metric tradeoffs
Summarize experimental outcomes
2.4 Data Traceability & Provenance
Design and implement a data provenance system for ML datasets.
Track:
Source S3 URIs
Post-processing operations applied to datasets
Implement a registry of post-processing functions with support for:
Easy addition and removal
Versioning and configuration tracking
Generate human-readable dataset identifiers.
Enable lookup and inspection of dataset lineage via API and/or web interface.

Milestones
Phase 1: Ingestion Stabilization (Months 0 3)
- Upload prioritization and off-hour scheduling
- Trash data handling
- Reliable status UI
- Capture documented bugs
Phase 2: Metrics Platform (Months 3 9)
- SQL-backed metrics database
- Automated metric generation
- Initial metrics outputs registered in database
- Project-level leaderboards and baselines
Phase 3: Visualization Platform (Months 9 15)
- Web-based dashboards for metrics and progress
- Time-series and tradeoff visualizations
- Experiment comparison views
Side Quest: Annotator
- Integrated LLM-based annotation service
Phase 4+: Data Traceability (Months 15-?)
- Dataset provenance tracking
- Post-processing registry
- Dataset lineage inspection tools
- Documentation and handoff

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10117123
Position Id: 8868102
Posted 8 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Data Platform & Visualization Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs