Deep Learning Scientist, Speech Synthesis

Santa Clara, CA, US • Posted 1 day ago • Updated 1 day ago
Contract W2
On-site
Depends on Experience
Company Branding Image
Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

  • PyTorch
  • Deep Learning
  • Python

Summary

Deep Learning Scientist – Speech Synthesis

Location: 100% Remote (Anywhere in the U.S.)

Duration: 6-Month Contract

Position Overview

We are seeking a Deep Learning Scientist – Speech Synthesis to support the development of next-generation speech AI technologies. This role focuses on training and optimizing speech models, improving model performance, and solving complex machine learning challenges related to speech applications.

The ideal candidate has strong experience in speech synthesis (Text-to-Speech) or Speech-to-Text, deep learning, and Python development. Success in this role requires the ability to analyze model behavior, diagnose training issues, and improve model performance—not just collect or evaluate data.

Key Responsibilities

  • Train and optimize speech synthesis models, including mel spectrogram and vocoder models.

  • Analyze training metrics, validation losses, and model performance to identify root causes of model issues and recommend improvements.

  • Benchmark and optimize speech models across multiple use cases.

  • Improve speech data preparation, augmentation, filtering, and dataset quality.

  • Develop and refine high-quality training datasets for speech AI models.

  • Measure and characterize model accuracy, quality, and bias.

  • Collaborate with cross-functional teams to develop and deliver new speech AI features.

  • Participate in software development, design reviews, testing, and code reviews.

  • Troubleshoot technical issues and contribute to continuous model improvements.

Required Qualifications

  • Master''s degree or Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, Computational Linguistics, or a related field (or equivalent experience).

  • 3+ years of relevant industry experience.

  • Strong Python programming skills.

  • Strong understanding of machine learning and deep learning concepts.

  • Experience with Text-to-Speech (TTS), Speech Synthesis, or Speech-to-Text (STT) technologies.

  • Hands-on experience training deep learning models using PyTorch.

  • Ability to analyze training behavior, validation losses, and model performance to troubleshoot and improve machine learning models.

  • Knowledge of speech signal processing concepts, including FFT, MFCC, and mel spectrograms.

  • Strong understanding of software development fundamentals.

  • Experience using version control systems such as Git, Gerrit, or GitLab.

  • Excellent communication and collaboration skills.

Preferred Qualifications

  • Experience with deep learning architectures such as CNNs, RNNs, LSTMs, and Transformers.

  • Experience with voice cloning or multilingual speech systems.

  • Knowledge of text normalization (TN), inverse text normalization (ITN), or grapheme-to-phoneme (G2P) systems.

  • Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese.

  • Interest in linguistics, phonetics, and speech technologies.

  • Strong C++ programming skills.

  • Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT.

  • Experience deploying machine learning models to cloud, data center, or embedded environments.

What We''re Looking For

The ideal candidate is someone who enjoys solving difficult machine learning problems and has hands-on experience training speech models. Beyond building models, we''re looking for someone who can investigate why a model is underperforming, analyze validation losses, identify root causes, and improve overall model quality and performance.

Additional Information

  • 100% remote position within the United States.

  • No specific U.S. time zone requirement.

  • This is a contract opportunity.

  • Opportunity to contribute to cutting-edge speech AI and deep learning technologies.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10529568
  • Position Id: 38382
  • Posted 1 day ago

Company Info

About Catapult Solutions Group

At Catapult Solutions Group, our mission is to help organizations thrive by enabling their teams to find "A hire standard" of talent and technology services.

Businesses succeed when they find the right people, information, and tools they need to accomplish meaningful work.

From Fortune 500 companies to startups, thousands of companies use Catapult to hit their business targets more efficiently with technology strategy, talent strategy, talent sourcing, and agile project delivery so that they can focus on the things that matter.

About_Company_One
Contact the job poster
Jorge Pertuz

Jorge Pertuz

Senior IT Recruiter @ Catapult Solutions Group
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs