hybrid

Machine Learning Data Engineer

Synthesia is seeking an experienced Machine Learning Data Engineer to design, develop, and maintain data processing pipelines for large quantities of text and audio data. This role involves using machine learning techniques to prepare ready-to-train datasets for large models and contributing to the development of an LLM-based TTS system.

About the role

About Synthesia

It is an exciting time to join Synthesia as we reached a hallmark by becoming a Unicorn, having raised $90 million in Series C funding and now evaluated at $1 billion!! ✨ 🦄

Synthesia is the world’s #1 AI video generation platform. Well, it’s actually a video production studio — in a browser. As in, no cameras or film crews at all. You simply choose an avatar, enter your script in one of 60 languages, and your video is ready in minutes. In Synthesia, you can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities. 🎬

We believe the future of media is synthetic, and we are on a mission to turn cameras into code and make everyone a creator. To learn more, check out our brand video that explains what we’re doing at Synthesia.

About The Role

We are looking for an experienced Machine Learning Data Engineer who loves dealing with large quantities of text and audio data. The successful candidate will be proficient in using machine learning techniques to build data processing pipelines, preparing ready-to-train datasets for large models.

If you are excited about the intersection of AI, Machine Learning, and Large Data, this role provides a unique opportunity to make a high-impact contribution. 💪🏻

Our aim is to make video content creation available for all - not only to studio production!

🧑🏼‍🔬 You will be someone who loves to code and build working systems. You are used to working in a fast-paced start-up environment. You will have experience with the software development life cycle, from ideation through implementation, to testing and release.

👩‍💼 You will join a group of more than 40 Engineers in the R&D department and will have the opportunity to collaborate with multiple research teams across diverse areas, our R&D research is guided by our co-founders - Prof. Lourdes Agapito and Prof. Matthias Niessner .

If you know and love Voicebox, Whisper, VALL-E, SPEAR-TTS and more - and you love machine learning and large data, then we would love to talk to you. We will also want to talk to you - if that's what you dream of doing. 🤩

What will you be doing?

🚀 In This Position, You'll Join The Team To Help Develop Our LLM-based TTS System That Will Provide Our Customers With Voice Clones That Are Indistinguishable From Real Voices. You Will Also Help Us Create High Quality, Production Ready Code And Take Ownership Of Production Pipelines. This Would Include

Designing, developing, and maintaining data processing pipelines, utilising machine learning techniques to handle vast amounts of text and audio data, while ensuring data quality and accessibility.
Leveraging your understanding of machine learning algorithms and workflows to prepare data most effectively for usage in large scale models.
Use Big Data tools and frameworks to process, analyse, and derive insights from structured and unstructured data.
Collaborating with other ML Engineers and Researchers to understand their data requirements and provide them with ready-to-train datasets.
Monitoring the performance of data pipeline and machine learning models, troubleshoot data-related issues, and perform root cause analysis to implement strategic solutions.
Stay up-to-date with emerging technologies and tools in machine learning and data engineering to continually improve our data infrastructure.
Document data pipeline architecture and workflow, present findings to relevant stakeholders, and provide training as needed.

Who are you?

You have a background in Computer Science, Engineering, or a related field with 3+ years of experience. Advanced degrees with a focus on Machine Learning are preferred.
Proven experience as a Data Engineer, or similar role, with a demonstrated history in designing and building scalable data pipelines using Machine Learning techniques.
Familiarity with audio data processing and voice technologies is highly desirable.
You have excellent coding skills in Python and you are very passionate about the software development side of things.
You have solid proficiency in Unix-like command line operations, including the creation and execution of both quick one-liners and complex bash scripts.
You put emphasis on documenting your work in a clear and concise manner.
Ability to work effectively in a fast-paced, agile environment.
And finally..You have excellent verbal and written communication skills and you are passionate about what you do!

Nice to have…

Transformers, Huggingface, Whisper ASR.
Multi-threaded Python
AWS framework.

About the role

About Synthesia

It is an exciting time to join Synthesia as we reached a hallmark by becoming a Unicorn, having raised $90 million in Series C funding and now evaluated at $1 billion!! ✨ 🦄

About The Role

If you are excited about the intersection of AI, Machine Learning, and Large Data, this role provides a unique opportunity to make a high-impact contribution. 💪🏻

Our aim is to make video content creation available for all - not only to studio production!

What will you be doing?

Designing, developing, and maintaining data processing pipelines, utilising machine learning techniques to handle vast amounts of text and audio data, while ensuring data quality and accessibility.
Leveraging your understanding of machine learning algorithms and workflows to prepare data most effectively for usage in large scale models.
Use Big Data tools and frameworks to process, analyse, and derive insights from structured and unstructured data.
Collaborating with other ML Engineers and Researchers to understand their data requirements and provide them with ready-to-train datasets.
Monitoring the performance of data pipeline and machine learning models, troubleshoot data-related issues, and perform root cause analysis to implement strategic solutions.
Stay up-to-date with emerging technologies and tools in machine learning and data engineering to continually improve our data infrastructure.
Document data pipeline architecture and workflow, present findings to relevant stakeholders, and provide training as needed.

Who are you?

You have a background in Computer Science, Engineering, or a related field with 3+ years of experience. Advanced degrees with a focus on Machine Learning are preferred.
Proven experience as a Data Engineer, or similar role, with a demonstrated history in designing and building scalable data pipelines using Machine Learning techniques.
Familiarity with audio data processing and voice technologies is highly desirable.
You have excellent coding skills in Python and you are very passionate about the software development side of things.
You have solid proficiency in Unix-like command line operations, including the creation and execution of both quick one-liners and complex bash scripts.
You put emphasis on documenting your work in a clear and concise manner.
Ability to work effectively in a fast-paced, agile environment.
And finally..You have excellent verbal and written communication skills and you are passionate about what you do!

Nice to have…

Transformers, Huggingface, Whisper ASR.
Multi-threaded Python
AWS framework.

Machine Learning Data Engineer

About the role

About Synthesia

About The Role

What will you be doing?

Who are you?

Nice to have…

Machine Learning Data Engineer

About the role

About Synthesia

About The Role

What will you be doing?

Who are you?

Nice to have…

Skills