Apply Now

AI Engineer (Speech)

Hybrid in Palo Alto, CA, US • Posted 9 days ago • Updated 3 days ago

Full Time

No Travel Required

Hybrid

$120,000 - $150,000/yr

Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

NeurIPS
ICML
ICLR
ACL
Interspeech
ICASSP
multimodal learning.
speech reasoning
Qwen-Audio
SALMONN
LTU
Gemini Audio
HuBERT
Wav2Vec 2.0
Whisper
WavLM

Summary

Key Responsibilities

• Design, develop, and deploy Large Audio Language Models (LALMs) capable of native audio understanding, reasoning, and generation.

• Build Large Audio Reasoning Models that perform complex chain-of-thought reasoning over speech and audio inputs, including medical, technical, and conversational domains.

• Contribute to Speech-to-Speech (S2S) system development, including speech understanding, dialogue management, and speech synthesis components.

• Research and implement alignment mechanisms between speech encoders and LLM backbones using lightweight adapters, LoRA, and efficient fine-tuning strategies.

• Design efficient speech tokenization and temporal compression techniques suitable for long-form audio reasoning and multi-turn spoken dialogue.

• Build comprehensive evaluation frameworks for audio reasoning capabilities, including benchmarks for speech QA, audio understanding, and reasoning accuracy.

• Optimize inference pipelines for low-latency, streaming applications in speech systems.

• Collaborate with cross-functional teams to transfer research innovations into production systems and customer-facing applications.

• Contribute to technical documentation, research write-ups, and publications at top-tier venues (NeurIPS, ICML, ACL, Interspeech).

Minimum Qualifications

• Master''''s degree (required) or Ph.D. (preferred) in Computer Science, Electrical Engineering, or a related field with a focus on speech, audio ML, or multimodal learning.

• 2+ years of industry or applied research experience in speech/audio AI, Large Language Models, or multimodal systems.

• Demonstrated applied research contributions through publications, patents, or shipped products in speech/audio AI or LLMs.

• Strong proficiency in Python and PyTorch, with hands-on experience in GPU-accelerated training for large-scale models.

• Solid understanding of speech and audio signal processing, acoustic modeling, and audio representations.

• Working knowledge of modern LLM architectures (Transformers, SSMs) and training paradigms including instruction tuning and alignment methods.

• Familiarity with modality alignment techniques: adapter-based integration, cross-modal attention, or audio-text fusion methods.

• Strong experimentation habits: clean code, systematic ablations, reproducibility, and clear technical communication.

Preferred Qualifications

• Publication record at top-tier venues (NeurIPS, ICML, ICLR, ACL, Interspeech, ICASSP) in audio language models, speech reasoning, or multimodal learning.

• Hands-on experience building or fine-tuning Large Audio Language Models (e.g., Qwen-Audio, SALMONN, LTU, Gemini Audio).

• Experience with speech representation pretraining (HuBERT, Wav2Vec 2.0, Whisper, WavLM) and discrete speech tokenization.

• Familiarity with Speech-to-Speech components: neural audio codecs (EnCodec, SoundStream), vocoders, or speech synthesis systems.

• Experience with audio reasoning benchmarks (AIR-Bench, MMAU, AudioBench) or building evaluation harnesses for audio QA.

• Hands-on experience with distributed training (FSDP, DeepSpeed) and inference optimization (ONNX, TensorRT, quantization).

• Familiarity with speech frameworks such as ESPnet, SpeechBrain, NVIDIA NeMo, or Fairseq.

• Experience with multilingual speech systems, code-switching, or domain adaptation for specialized applications (medical, legal, technical).

• Background in evaluating safety, bias, hallucination, or adversarial robustness in audio language models.

Technical Environment

• Core: PyTorch, CUDA, torchaudio/librosa, Hugging Face Transformers

• LLM Stack: Large language model backbones, lightweight adapters (LoRA, Q-Former), instruction tuning pipelines

• Audio Models: Neural audio codecs, speech encoders, vocoders, discrete speech tokenizers

• Infrastructure: Modern GPU clusters, experiment tracking (Weights & Biases), distributed training frameworks

• Deployment: FastAPI/gRPC for services, ONNX/TensorRT for optimized inference

What We Offer

• Competitive compensation package with comprehensive benefits

• Opportunity to work on cutting-edge Large Audio Language Models and audio reasoning research with real-world impact

• Collaboration with experienced applied scientists and engineers in speech and multimodal AI

• Support for publications at top-tier conferences and professional development

• Access to state-of-the-art GPU infrastructure for training large-scale audio models

• Flexible work arrangements with hybrid/remote options

Equal Opportunity Employer

Centific AI Research is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, disability, veteran status, or any other legally protected characteristic

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10381355
Position Id: 8950565
Posted 9 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Hybrid in Mountain View, California

•

12d ago

Senior AI Engineer (LLMs | GenAI | RAG | Deep Learning) Mountain View, CA (Remote US based candidates) | Full-Time We are hiring a hands-on Senior AI Engineer to build and deploy production-grade AI systems combining Predictive AI and Generative AI to deliver personalized health insights. ResponsibilitiesDevelop LLM-powered applications (Prompt Engineering, RAG, AI Evals) Build conversational AI agents and recommendation systems Design, train, and optimize deep learning models Deploy scalable ML

Easy Apply

Full-time

Depends on Experience

Applied AI ML Lead [Multiple Positions Available]

Palo Alto, California

•

Today

Job Description Duties: Design, build and deploy machine learning (ML) and natural language processing (NLP) techniques to text data for classification, summarization and sentiment analysis. Design, develop, and maintain comprehensive data pipelines and frameworks for data preprocessing and streamlined deployment, monitoring, and management of ML applications. Monitor performance and document methodologies and logs to drive improvements and ensure reproducibility. Define the architectural roadm

Full-time

USD 275,850.00 - 275,850.00 per year

Senior Research Engineer/Scientist

Santa Clara, California

•

Today

Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500 . Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But thi

Full-time

USD 182,300.00 - 310,000.00 per year

AI Research Scientist - Large Language Models (LLM) & Agentic AI

Sunnyvale, California

•

Today

Company Description The Bosch Research and Technology Center North America with offices in Sunnyvale, California, Pittsburgh, Pennsylvania, and Cambridge, Massachusetts is a part of the global Bosch Group (, a company with over 70 billion euro revenue, 400,000 employees worldwide, a very diverse product portfolio, and a history spanning over 125 years. The Research and Technology Center North America (RTC-NA) is dedicated to providing technologies and system solutions for various Bosch business

Full-time

USD 165,000.00 - 180,000.00 per year

Search all similar jobs

AI Engineer (Speech)

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs