Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio is choppy at the beginning when I provide text longer than 40 words #615

Open
4 tasks done
hungpv297 opened this issue Dec 11, 2024 · 0 comments
Open
4 tasks done
Labels
help wanted Extra attention is needed

Comments

@hungpv297
Copy link

Checks

  • This template is only for usage issues encountered.
  • I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
  • I have searched for existing issues, including closed ones, and couldn't find a solution.
  • I confirm that I am using English to submit this report in order to facilitate communication.

Environment Details

python=3.10.12
torch=2.3.0

Steps to Reproduce

  • I trained model with 800 hours and about 1M7 iters for see the result.
  • You can hear the beginning is a bit choppy. This happens by default when I provide quite long input text.
  • My inference parameters:
  1. speed: 0.8
  2. cross_fade_duration: 0.5
  3. remove_silence: True
  4. ref text: con đầu tiên này là nâu sọc trắng nè.
  5. gen_text: con đầu tiên này là nâu sọc trắng nè. con thứ hai là nâu sọc đen. con thứ ba là màu cam và con cuối cùng là màu tím nè. con đầu tiên này là nâu sọc trắng nè. con thứ hai là nâu sọc đen. con thứ ba là màu cam và con cuối cùng là màu tím nè.
output.mov

Any help is appreciated. Many thanks.

✔️ Expected Behavior

No choppy

❌ Actual Behavior

No response

@hungpv297 hungpv297 added the help wanted Extra attention is needed label Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant