onsite
Senior Deep Learning Engineer - Speech Synthesis - 42dot
ML Engineer
Lead cutting‑edge research and production of multilingual, emotion‑controllable text‑to‑speech systems using LLMs and Flow Matching, optimizing models for server and on‑device deployment with real‑time streaming and low latency.
About the role
Key Responsibilities
- Research and develop state‑of‑the‑art TTS models based on LLM and Flow Matching techniques.
- Design and refine emotion‑controllable TTS architectures to enhance naturalness and expressiveness.
- Build and curate large‑scale, high‑quality speech synthesis datasets using generative models.
- Develop multilingual and multi‑speaker TTS models and integrate them into production services.
- Optimize TTS models for server and on‑device inference, including quantization and ONNX conversion.
- Implement real‑time streaming synthesis pipelines, focusing on latency reduction and robustness.
- Improve inference and training pipelines to boost synthesis quality and training efficiency.
Requirements
- 3+ years of experience in TTS or related speech synthesis research.
- Strong foundation in deep learning frameworks (PyTorch, TensorFlow) and Python programming.
- Hands‑on experience with LLMs, Flow Matching, and generative modeling for audio.
- Knowledge of model optimization techniques for server and edge deployment (quantization, ONNX).
- Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
Skills
pythonpytorchtensorflowllmc