Overview
Skills
Job Details
We are looking for a Machine Learning Engineer who will use machine learning and statistical techniques to help us create state-of-the-art solutions for non-trivial, and arguably, unsolved problems. If you are results driven, interested in how to apply advanced Machine Learning techniques, would love to work with voice and text, are deeply technical, highly innovative and long for the opportunity to build solutions for challenging problems that directly impact the company's bottom-line, we want to talk to you.
Responsibilities:
Design, implement and optimize an end to end Conversational Speech LLM-based virtual assistant
Evaluate and benchmark speech native models (Moshi, SesameAI etc) for in-vehicle applications
Design and execute model fine-tuning strategies for automative domain adaptation.
Implement tool API frameworks for vehicle system control
Implement hardware specific optimization for Qualcomm SA8255P platform.
Develop and maintain Python code for audio preprocessing, model integration and hardware optimization
Document architecture analyses, benchmarking results and optimization approaches
Execute full modeling life cycle including data cleansing, feature creation and iterative model selection
Work in a fast-paced Agile Scrum environment to assist in prototyping, designing, and implementing predictive models and algorithms to create real world solutions
Required Qualifications:
3+ years of hands-on experience in Machine Learning in a corporate environment
Deep understanding of Voice2Voice Architectures and Speech native models
Experience with model quantization techniques and optimization for edge devices.
Strong background in audio processing including feature extraction, noise handling and acoustic modeling.
Experience working with Tensorflow or PyTorch
Experience with speech recognition (ASR) and text-to-speech (TTS) technologies (Speech Encoders, Transformer variants used in ASR and TTS)
Knowledge of multimodal learning and techniques for fusing audio and text information. (Attention Mechanisms, Cross Modal Attention)
Solid understanding of audio processing concepts, including audio feature extraction, signal processing, and acoustic modeling (TorchAudio, Librosa)
Experience with fine-tuning transformer models and developing training pipelines. (Huggingface Transformers, PyTorch/Tensorflow, Distributed training pipelines)
Research Experience in Academia or industry around transformer and multimodal technologies.
Familiar with algorithm design and complexity analysis
Strong decision-making skills with the ability to analyze data, assess risks, and implement effective solutions in a fast-paced environment
Problem-solving skills with the ability to identify challenges, develop creative solutions, and implement effective strategies
Proven ability to learn and apply new technologies, programming practices, patterns, and methods
Experience collaborating effectively with cross-functional teams, including developers, designers, and product owners
Experience taking ownership of assigned projects and tasks, proactively driving them to completion while ensuring accountability for quality and deadlines.
Results-driven with a strong track record of setting goals, executing strategies, and delivering measurable outcomes