Logo

Logo

Meta introduces multilingual speech translation model for 100 languages

Heating up the artificial intelligence (AI) race, Meta on Tuesday launched a new all-in-one, multilingual multimodal AI translation and transcription…

Meta introduces multilingual speech translation model for 100 languages

(File Photo)

Heating up the artificial intelligence (AI) race, Meta on Tuesday launched a new all-in-one, multilingual multimodal AI translation and transcription model for up to 100 languages depending on the task.

Called ‘SeamlessM4T,’ the single model can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations.

Advertisement

‘SeamlessM4T’ supports speech recognition for nearly 100 languages, speech-to-text translation for nearly 100 input and output languages, speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages and text-to-text translation for nearly 100 languages.

Advertisement

It can also support text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages.

“We’re also releasing the metadata of SeamlessAlign, the biggest open multimodal translation dataset to date, totalling 270,000 hours of mined speech and text alignments,” Meta said in a blog post.

Last year, Meta released No Language Left Behind (NLLB), a text-to-text machine translation model that supports 200 languages, and has since been integrated into Wikipedia as one of the translation providers.

“We also shared a demo of our Universal Speech Translator, which was the first direct speech-to-speech translation system for Hokkien, a language without a widely used writing system,” said the company.

Earlier this year, we revealed Massively Multilingual Speech, which provides speech recognition, language identification and speech synthesis technology across more than 1,100 languages.

‘SeamlessM4T’ draws on findings from all of these projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources with state-of-the-art results, Meta noted.

Advertisement