Overview
On Site
Full Time
Skills
FOCUS
Collaboration
Algorithms
Extraction
Batch Processing
Streaming
Training
Evaluation
Data Quality
Cloud Computing
Amazon Web Services
Google Cloud Platform
Google Cloud
Microsoft Azure
Data Visualization
Computer Science
Electrical Engineering
Signal Processing
Python
Data Engineering
Apache Beam
Apache Spark
IaaS
Workflow
Time Series
Database
Data Storage
Apache Parquet
Data Management
Deep Learning
PyTorch
TensorFlow
Acoustics
Modeling
Speech Recognition
Real-time
Machine Learning (ML)
Job Details
Title : Data Engineer (ML Focus)
Duration: 6 Months
Location: Cupertino, CA 95014
36214219
100 % Onsite
Key Responsibilities
Duration: 6 Months
Location: Cupertino, CA 95014
36214219
100 % Onsite
Key Responsibilities
- Design, build, and maintain scalable and efficient data pipelines for processing large-scale audio and acoustic datasets.
- Collaborate with ML researchers and acoustic scientists to collect, annotate, transform, and curate high-quality training and evaluation datasets.
- Implement signal processing algorithms for feature extraction
- Work on real-time and batch processing frameworks for streaming and static audio data.
- Support model training and evaluation through optimized data loaders and preprocessing steps.
- Ensure data quality, versioning, and reproducibility using best practices in data engineering.
- Deploy and maintain cloud-based infrastructure for data workflows (e.g., AWS, Google Cloud Platform, Azure).
- Develop tools for data visualization and annotation specific to acoustic events.
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, Acoustics, or a related field.
- Strong experience with audio signal processing libraries (e.g., Librosa, PyDub, SciPy, torchaudio).
- Proficient in Python and relevant data engineering frameworks (e.g., Airflow, Apache Beam, Spark).
- Experience working with large-scale data pipelines and cloud infrastructure.
- Familiarity with machine learning workflows, especially in audio or time-series domains.
- Understanding of acoustic features and formats (e.g., WAV, FLAC, sampling rates).
- Strong knowledge of databases, data storage formats (e.g., Parquet, HDF5), and data management tools.
- Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) for audio modeling.
- Knowledge of acoustic modeling, speech recognition, or sound classification.
- Experience with edge deployment and real-time audio processing.
- Familiarity with tools like Weights & Biases, MLflow, or DVC for ML operations.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.