Senior Machine Learning AI Engineer / Remote / Entertainment / Audio ML expert

Overview

Remote

On Site

225k - 275k

Full Time

Skills

LOS

Startups

Computer Hardware

Artificial Intelligence

BERT

Embedded Systems

Scratch

PyTorch

TensorFlow

OpenCV

Evaluation

Real-time

Amazon Web Services

FFmpeg

Machine Learning (ML)

Signal Processing

Fusion

Video

Modeling

Optimization

Training

Cross-functional Team

Collaboration

Insurance

SAP BASIS

Job Details

Job Description
This is a full-time opportunity based in Los Angeles (hybrid or remote optional for the right candidate) with a stealth-mode startup that's building foundational AI technology for the connected home. The team is rethinking what it means for hardware, software, and machine learning to work together at the OS level delivering intelligent, real-time responsiveness through multimodal input (voice, video, and context). The tech stack is cutting-edge, and this role centers on applied machine learning at the intersection of speech, audio, and video understanding.
Join a world-class team of engineers, scientists, and designers reimagining the interface between people and their environments. In this role, you'll help build a system that doesn't just respond it anticipates. This is an incredible opportunity for an Audio ML or Intent Recognition expert to shape a product that integrates AI into daily life in a meaningful, consumer-facing way. The role offers ownership, autonomy, and the chance to work with advanced ML architectures, foundational model development, and real-time sequence processing.
Required Skills & Experience
4+ years of experience in applied ML focused on audio, speech, or intent classification
Strong foundation in training sequence models (RNNs, Transformers, BERT, Whisper, CLIP, Wav2Vec, etc.)
Experience working with speech/audio ML, embedded systems, or multimodal inputs
Proven track record building and training models from scratch not just relying on APIs
Deep familiarity with tools like PyTorch, TensorFlow, Hugging Face Transformers, torchaudio, librosa, OpenCV, etc.
Desired Skills & Experience
Experience with signal processing, voice activity detection, audio embeddings, speaker identification
Comfort with model evaluation, real-time inference, labeling, augmentation, and tuning
Background with visual sequence modeling, action recognition, or video context detection
Prior experience in environments like OpenAI, DeepMind, Amazon Alexa, Dolby, Sonos, Roku, AssemblyAI, etc., is a strong plus
Familiarity with ONNX, ffmpeg, NVIDIA Triton also welcome
What You Will Be Doing
Tech Breakdown
Audio ML and signal processing
Intent recognition and sequence classification
Multimodal fusion (audio + video input modeling)
System optimization and integration
Daily Responsibilities
80% Hands-on model development, training, tuning
20% Cross-functional team collaboration, experimentation, and architecture discussions
The Offer
equity eligible
You will receive the following benefits:
Medical, Dental, and Vision Insurance
Paid Vacation & Holidays
Generous Equity Package

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Motion Recruitment Partners, LLC

Share