Data Engineer II

Overview

On Site

Full Time

Skills

FOCUS

Collaboration

Algorithms

Extraction

Batch Processing

Streaming

Training

Evaluation

Data Quality

Cloud Computing

Amazon Web Services

Google Cloud Platform

Google Cloud

Microsoft Azure

Data Visualization

Computer Science

Electrical Engineering

Signal Processing

Python

Data Engineering

Apache Beam

Apache Spark

IaaS

Workflow

Time Series

Database

Data Storage

Apache Parquet

Data Management

Deep Learning

PyTorch

TensorFlow

Acoustics

Modeling

Speech Recognition

Real-time

Machine Learning (ML)

Job Details

Title : Data Engineer (ML Focus)
Duration: 6 Months
Location: Cupertino, CA 95014
36214219
100 % Onsite
Key Responsibilities

Design, build, and maintain scalable and efficient data pipelines for processing large-scale audio and acoustic datasets.
Collaborate with ML researchers and acoustic scientists to collect, annotate, transform, and curate high-quality training and evaluation datasets.
Implement signal processing algorithms for feature extraction
Work on real-time and batch processing frameworks for streaming and static audio data.
Support model training and evaluation through optimized data loaders and preprocessing steps.
Ensure data quality, versioning, and reproducibility using best practices in data engineering.
Deploy and maintain cloud-based infrastructure for data workflows (e.g., AWS, Google Cloud Platform, Azure).
Develop tools for data visualization and annotation specific to acoustic events.

Required Qualifications

Bachelor's or Master's degree in Computer Science, Electrical Engineering, Acoustics, or a related field.
Strong experience with audio signal processing libraries (e.g., Librosa, PyDub, SciPy, torchaudio).
Proficient in Python and relevant data engineering frameworks (e.g., Airflow, Apache Beam, Spark).
Experience working with large-scale data pipelines and cloud infrastructure.
Familiarity with machine learning workflows, especially in audio or time-series domains.
Understanding of acoustic features and formats (e.g., WAV, FLAC, sampling rates).
Strong knowledge of databases, data storage formats (e.g., Parquet, HDF5), and data management tools.

Preferred Qualifications

Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) for audio modeling.
Knowledge of acoustic modeling, speech recognition, or sound classification.
Experience with edge deployment and real-time audio processing.
Familiarity with tools like Weights & Biases, MLflow, or DVC for ML operations.

#TB_EN

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share