My name is Boopathi, and I m a Senior Technical Recruiter at Cloud Destinations LLC.
Please see the below job description and let me know if you are interested in this position.
Title: Deep Learning Scientist, Speech Synthesis(3-5 Years)
Location: Remote
Duration: 6 months+
Number of Openings: 1
Company Overview
World Wide Technology is a global technology solutions provider that delivers innovative solutions across cloud, security, networking, and software development. We partner with leading organizations to help them scale their technology capabilities and accelerate business outcomes. Our culture is built on integrity, teamwork, trust, and a commitment to continuous learning.
Role Overview
Role is 100% remote, anywhere in the US no time zone requirement.
-Experience: 3+ years (not 5+)
-Speech-to-Text is acceptable in place of TTS if the candidate is strong in ML and Python
-Must-haves: TTS or Speech Synthesis, Machine Learning, Python
-Key differentiator: Someone who can analyze data validation losses and diagnose why a model isn't working model training is the core of this role, not just data collection/evaluation.
World Wide Technology is seeking a Deep Learning Scientist, Speech Synthesis to support a leading technology client. This role focuses on advancing cutting edge speech AI solutions with a strong emphasis on text to speech systems and model optimization. The selected candidate will contribute to impactful initiatives that enhance large scale speech applications used by millions of users.
Key Responsibilities
- Train speech synthesis mel spectrogram and vocoder models
- Measure and benchmark model performance across use cases
- Maintain and enhance text to speech evaluation systems
- Analyze model accuracy and bias and recommend improvements
- Improve processes related to speech data preparation, augmentation, and filtering
- Develop and refine training datasets for speech models
- Characterize performance and quality metrics across different platforms
- Collaborate with cross functional teams to deliver new product features
- Participate in code development, design reviews, and test planning
- Identify issues, propose solutions, and contribute to continuous innovation
Required Qualifications
- Master s degree or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, or Computational Linguistics or equivalent experience
- Minimum of 5 years of relevant experience
- Strong programming skills in Python
- Solid understanding of programming fundamentals and software design
- Deep knowledge of machine learning and deep learning techniques including CNN, RNN, LSTM, and Transformers
- Experience applying deep learning to speech synthesis, large language models, and speech to speech translation
- Hands on experience with speech technologies such as speech synthesis and voice cloning
- Experience training speech models
- Proficiency with PyTorch deep learning frameworks
- Knowledge of speech signal processing techniques including FFT, MFCC, and mel spectrograms
- Familiarity with version control tools such as Git, Gerrit, or GitLab
- Strong collaboration and communication skills in a matrixed environment
Preferred Qualifications
- Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese
- Experience with multilingual or code switched text to speech systems
- Experience with voice cloning and cross lingual voice cloning
- Knowledge of text normalization and inverse text normalization using neural networks or WFST
- Experience working with grapheme to phoneme systems for multiple languages
- Interest in linguistics, phonetics, and language technologies
- Strong C plus plus programming skills
- Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT
- Experience deploying machine learning models to cloud, data center, or embedded systems