onsite
Deep Learning Scientist, Speech Synthesis - Catapult Solutions Group
ML Engineer
Develop and optimize cutting‑edge text‑to‑speech models using deep learning frameworks, improve inference performance, and engineer large‑scale speech datasets for next‑generation speech AI applications.
About the role
Key Responsibilities
- Design, implement, and train state‑of‑the‑art neural TTS architectures (e.g., Tacotron, FastSpeech, VITS).
- Optimize models for low‑latency, high‑quality inference on GPU/CPU platforms, including quantization and pruning.
- Engineer and curate large speech corpora, perform data cleaning, augmentation, and alignment for robust model training.
- Collaborate with cross‑functional teams to integrate TTS components into production pipelines and evaluate end‑user experience.
- Publish research findings, contribute to internal knowledge bases, and stay current with advances in speech synthesis and deep learning.
Requirements
- Ph.D. or Master’s in Computer Science, Electrical Engineering, or related field with a focus on deep learning or speech processing.
- 5+ years of hands‑on experience building and deploying TTS or related speech synthesis systems.
- Proficiency in Python and deep learning frameworks such as PyTorch or TensorFlow.
- Strong understanding of model optimization techniques (quantization, pruning, knowledge distillation) and GPU acceleration (CUDA).
- Experience with large‑scale speech data pipelines, audio preprocessing, and evaluation metrics (MOS, PESQ, etc.).
Skills
pythonpytorchtensorflowdeep learning