Audio is choppy at the beginning when I provide text longer than 40 words #615

hungpv297 · 2024-12-11T04:12:27Z

This template is only for usage issues encountered.
I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
I have searched for existing issues, including closed ones, and couldn't find a solution.
I confirm that I am using English to submit this report in order to facilitate communication.

python=3.10.12
torch=2.3.0

I trained model with 800 hours and about 1M7 iters for see the result.
You can hear the beginning is a bit choppy. This happens by default when I provide quite long input text.
My inference parameters:

speed: 0.8
cross_fade_duration: 0.5
remove_silence: True
ref text: con đầu tiên này là nâu sọc trắng nè.
gen_text: con đầu tiên này là nâu sọc trắng nè. con thứ hai là nâu sọc đen. con thứ ba là màu cam và con cuối cùng là màu tím nè. con đầu tiên này là nâu sọc trắng nè. con thứ hai là nâu sọc đen. con thứ ba là màu cam và con cuối cùng là màu tím nè.

output.mov

Any help is appreciated. Many thanks.

No choppy

No response

hungpv297 added the help wanted Extra attention is needed label Dec 11, 2024

Provide feedback