onsite

Deep Learning Scientist, Speech Synthesis - Catapult Solutions Group

ML Engineer

Develop and optimize cutting‑edge text‑to‑speech models using deep learning frameworks, improve inference performance, and engineer large‑scale speech datasets for next‑generation speech AI applications.

About the role

Key Responsibilities

Design, implement, and train state‑of‑the‑art neural TTS architectures (e.g., Tacotron, FastSpeech, VITS).
Optimize models for low‑latency, high‑quality inference on GPU/CPU platforms, including quantization and pruning.
Engineer and curate large speech corpora, perform data cleaning, augmentation, and alignment for robust model training.
Collaborate with cross‑functional teams to integrate TTS components into production pipelines and evaluate end‑user experience.
Publish research findings, contribute to internal knowledge bases, and stay current with advances in speech synthesis and deep learning.

Requirements

Ph.D. or Master’s in Computer Science, Electrical Engineering, or related field with a focus on deep learning or speech processing.
5+ years of hands‑on experience building and deploying TTS or related speech synthesis systems.
Proficiency in Python and deep learning frameworks such as PyTorch or TensorFlow.
Strong understanding of model optimization techniques (quantization, pruning, knowledge distillation) and GPU acceleration (CUDA).
Experience with large‑scale speech data pipelines, audio preprocessing, and evaluation metrics (MOS, PESQ, etc.).

Skills

pythonpytorchtensorflowdeep learning

CompanyCatapult Solutions Group

DepartmentResearch

LocationSanta Clara, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 25, 2026