Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any experiment on Chinese data set. #91

Open
zhaojingxin123 opened this issue Aug 8, 2024 · 6 comments
Open

Is there any experiment on Chinese data set. #91

zhaojingxin123 opened this issue Aug 8, 2024 · 6 comments

Comments

@zhaojingxin123
Copy link

May I ask if there is any experiment on Chinese data set? Why I use pinyin as phoneme training on Chinese Mandarin data set, and what I synthesize is all noise?

@zhaojingxin123
Copy link
Author

Does anyone know why that is? Or is there a Chinese data set with experimental success? What methods do you use to phoneme Chinese texts?

@shivammehta25
Copy link
Owner

I am sorry, I haven't trained a Chinese dataset, but I can assure that the model training is language independent. There are forks in Krygz https://github.com/UlutSoftLLC/MamtilTTS and Catalan https://huggingface.co/projecte-aina/matxa-tts-cat-multiaccent . So perhaps someone who has trained on a Chinese dataset can chip into the conversation.

Just to confirm, did you see this page? https://github.com/shivammehta25/Matcha-TTS/wiki/Training-%F0%9F%8D%B5-Matcha%E2%80%90TTS-with-different-dataset-&-languages

@zhaojingxin123
Copy link
Author

Hello author, thank you for your anwser !!!
In addition, I am deeply sorry that I have been ill recently and have not seen your message.
there should be no big problems with your code and model.
It because I use a wrong way coding Chinese to phonemes .
Your project can indeed be applied to Chinese,But what I trained model generate wavs was very noisy,

I trained the model on a chinese dataset AISHELL3 ,119 epochs, poor reception

myconfig is:
image
image

What do you think is the reason?
1.The number of epochs trained is not enough?
2.Or because the number of spk 174 is too much?
3.each spk"s data is not enough?
4.the n_vocab: 50 of the symbols ,Is there any influence?

how can i improve the synthesis ?

@shivammehta25
Copy link
Owner

I think the dataset size and training should be enough.

4.the n_vocab: 50 of the symbols ,Is there any influence?
Do you really have only 50 symbols? I feel something might be wrong here, what phonemizer are you using?

@zhaojingxin123
Copy link
Author

thank you foryour anwser ,shivammehta25。
It's not International Phonetic Alphabet (IPA), but rather Taiwanese Pinyin, a type of Chinese phoneme with 50 symbols.
the model (i trained with AISHELL3) has a bit of human voice but also contains a lot of noise.
Previously, I used the Mainland Chinese version of Pinyin, another form of Chinese phonetic notation with over 200 symbols.
I suspect that the issue might be due to wrong processing, specifically with the Mainland Chinese version of Pinyin.

@TheWayLost
Copy link

thank you foryour anwser ,shivammehta25。 It's not International Phonetic Alphabet (IPA), but rather Taiwanese Pinyin, a type of Chinese phoneme with 50 symbols. the model (i trained with AISHELL3) has a bit of human voice but also contains a lot of noise. Previously, I used the Mainland Chinese version of Pinyin, another form of Chinese phonetic notation with over 200 symbols. I suspect that the issue might be due to wrong processing, specifically with the Mainland Chinese version of Pinyin.

So do you mean that using the Mainland Chinese version of Pinyin instead would not cause this problem? (I am also trying to use the method on Chinese dataset, I think this method is truly interesting)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants