Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I deal with mixed languages in a sentence/plain text line? #270

Open
ErfolgreichCharismatisch opened this issue Jan 22, 2021 · 3 comments

Comments

@ErfolgreichCharismatisch

If two languages are combined within one sentence, the algorithm cannot align all following sentences.
Ie all following sentences are off sync.
I tried task_language=language1,language2 which wasn't accepted, just as task_language=language1|task_language=language2 were not accepted.
How do I deal with mixed languages in a sentence/plain text line?

@yasntrk
Copy link

yasntrk commented Nov 25, 2021

@zxul767
Copy link

zxul767 commented Aug 4, 2023

I suspect that the issue is probably that the default speech synthesizer (espeak) cannot generate proper speech for two languages in a single sentence, so the generated alignment for that sentence is really bad, throwing off the alignment for the rest of the sentences.

I think we'd need to use a "code-switching" speech synthesizer to fix that. A quick web search turned up this one (see the "code-switching" examples at the bottom). Google didn't initially release code for that model, but I wouldn't be surprised if there are now at least a few of open-source projects that have done something similar.

@zxul767
Copy link

zxul767 commented Aug 4, 2023

Here's what ChatGPT suggests:

As of my last update in September 2021, "code-switching," which involves seamlessly switching between two or more languages within a sentence or conversation, is a challenging task for Text-to-Speech (TTS) systems. However, some TTS tools have been working on supporting multilingual capabilities and handling code-switching to some extent. Here are a few TTS tools that have been exploring code-switching or multilingual support:

  1. Google Text-to-Speech (gTTS): Google's TTS system has been known to handle some level of multilingual text, including code-switching between languages. It uses neural network-based models and can switch between supported languages relatively well.
  1. Mozilla TTS (Tacotron 2): Mozilla's TTS system, also known as Tacotron 2, has been evolving to handle multilingual input. It supports multiple languages, and with appropriate configuration, it may be able to handle code-switching scenarios.
  1. Facebook's wav2vec 2.0 + Hugging Face's TTS: wav2vec 2.0 by Facebook AI Research (FAIR) and Hugging Face's TTS library offer multilingual TTS capabilities. By leveraging the power of wav2vec 2.0's pretrained models, TTS systems can handle multilingual input and code-switching to some extent.
  1. DeepMind's WaveNet and Tacotron: Some researchers have experimented with DeepMind's WaveNet and Tacotron TTS systems to handle multilingual code-switching scenarios. While not native to the models, certain adaptations can be made to support code-switching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants