Replies: 1 comment 2 replies
-
Same problem here for spanish! The alignment matrix looks good but the audio it's just mumbling. I'm using 44,1k sampling rate. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using my own dataset which is a Chinese professional tts speech dataset of 8k utterances to train bert-vits2. The generated speech is not clear. It sounds like the speaker but the content cannot be understood. It's just mumbling. What could be the problem? Thank you. And I am using 16k sampling rate.
Beta Was this translation helpful? Give feedback.
All reactions