You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This template is only for question, not feature requests or bug reports.
I have thoroughly reviewed the project documentation and read the related paper(s).
I have searched for existing issues, including closed ones, no similar questions.
I confirm that I am using English to submit this report in order to facilitate communication.
Question details
Hello Community, thanks for your wonderful work of this nice project. I have encountered some problems with finetune and needs your help.
Recently I have tried some finetune tasks for languages like Thai, Laos, German and so on with small scale datasets, about 10 hours one speaker. My total train steps is 1200k and finetune is base on https://huggingface.co/SWivid/F5-TTS
I have found many Pronunciation Problems in the models. I have also tried some models shared on huggingface, the wer is very low. So I want to know:
If I need more data for get a low wer model? And how long about it, my languages is not the same with the base model, not Chinese and English. I have only about 10 hours of my target speaker, if I add some opensource dataset, can it help for low wer?
any Hyperparameters suggestion for small scale dataset like 10 hours?
How to split the sentences? split by word, by char or by syllable?
The text was updated successfully, but these errors were encountered:
could try char first, which is the simplest way to do
if the utterance of language is very hard for model to learn, then syllable (say, need grapheme-to-phoneme), or syllable with bpe
Checks
Question details
Hello Community, thanks for your wonderful work of this nice project. I have encountered some problems with finetune and needs your help.
Recently I have tried some finetune tasks for languages like Thai, Laos, German and so on with small scale datasets, about 10 hours one speaker. My total train steps is 1200k and finetune is base on https://huggingface.co/SWivid/F5-TTS
I have found many Pronunciation Problems in the models. I have also tried some models shared on huggingface, the wer is very low. So I want to know:
The text was updated successfully, but these errors were encountered: