Overview
On Site
$45 - $60
Contract - W2
Contract - 7 Month(s)
Skills
Machine Learning
Python
Audio Signal
Librosa
Data
PyTorch
Job Details
ML Data Engineer
A globally leading technology company is looking for an ML Data Engineer to design and optimize data pipelines for large-scale audio and acoustic machine learning workflows. In this role, you ll collaborate with researchers and signal processing experts to deliver high-quality, scalable datasets that power state-of-the-art models. If you're passionate about machine learning infrastructure, audio data, and real-world impact, we d love to hear from you.
Key Responsibilities:
- Design, build, and maintain scalable and efficient data pipelines for processing large-scale audio and acoustic datasets.
- Collaborate with ML researchers and acoustic scientists to collect, annotate, transform, and curate high-quality training and evaluation datasets.
- Implement signal processing algorithms for feature extraction
- Work on real-time and batch processing frameworks for streaming and static audio data.
- Support model training and evaluation through optimized data loaders and preprocessing steps.
- Ensure data quality, versioning, and reproducibility using best practices in data engineering.
- Deploy and maintain cloud-based infrastructure for data workflows (e.g., AWS, Google Cloud Platform, Azure).
- Develop tools for data visualization and annotation specific to acoustic events.
Required Qualifications:
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, Acoustics, or a related field.
- Strong experience with audio signal processing libraries (e.g., Librosa, PyDub, SciPy, torchaudio).
- Proficient in Python and relevant data engineering frameworks (e.g., Airflow, Apache Beam, Spark).
- Experience working with large-scale data pipelines and cloud infrastructure.
- Familiarity with machine learning workflows, especially in audio or time-series domains.
- Understanding of acoustic features and formats (e.g., WAV, FLAC, sampling rates).
- Strong knowledge of databases, data storage formats (e.g., Parquet, HDF5), and data management tools.
- Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) for audio modeling.
- Knowledge of acoustic modeling, speech recognition, or sound classification.
- Experience with edge deployment and real-time audio processing.
- Familiarity with tools like Weights & Biases, MLflow, or DVC for ML operations.
Type: Contract
Duration: 7 months (with a possibility to extend to 18 months)
Work Location: Cupertino, CA (100% On site)
Pay Rate: $ 45.00 - $ 60.00 (DOE)
No C2C or third party agencies
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.