-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blank Audio #669
Comments
Blank Audio Output with F5-TTS Model When attempting to generate audio using the F5-TTS model, the resulting audio output is always a blank file with a duration of exactly 00:00:01 seconds. Despite following the installation and usage instructions, including troubleshooting steps (such as forcing The issue occurs both when using the command-line interface ( Environment Details:
Steps to Reproduce:
Despite these steps, the output audio file ( Expected Behavior:
Actual Behavior:
Troubleshooting Steps Taken:
Additional Information:
Suggested Next Steps:
If any further details are required, or if you'd like specific logs or configurations, I'd be happy to supply them. |
try ctrl+left_click on |
The tmp file does have audio, althought 11ms shorter, but it does play and sound like the original in file ffmpeg -i tmpist0frdt.wav -hide_banner
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '.\tmpist0frdt.wav':
Duration: 00:00:08.35, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s and the original in.wav for reference ffmpeg -i in.wav -hide_banner
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '.\in.wav':
Metadata:
encoder : Lavf59.16.100
Duration: 00:00:08.46, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
At least one output file must be specified |
emmm, so the input is actually fine how is the loading result, e.g. can torchaudio properly load the audio? F5-TTS/src/f5_tts/infer/utils_infer.py Line 376 in 3e73553
could try print out audio see if has proper content inside or just null
if null, maybe it's version conflict between ffmpeg/sox-io/other audio backend and torchaudio (or reinstall backend might help |
Checks
Environment Details
Any attempt at generating audio results in blank audio of exactly 00:00:01 sec long.
My current test environment setup is :
Steps to Reproduce
No visible issues that I can see
✔️ Expected Behavior
Expected TTS audio.
❌ Actual Behavior
I did attempt the fix fp32 from #356 changed:
Without any success.
Running it as CLI outputs:
No visible issues that I can see
and running with Gradio outputs no visible issues as well, yet no audio:
Unless I should also change gradio processing_utils.py, I would think it shouldn't matter.
There was another issue where id get
Unknown encoder 'pcm_s4le'
but runningffmpeg -i input.wav -c:a pcm_s16le -ar 16000 output.wav
on my sample seemed to have gotten rid of that problem.I'd be more than happy to supply more details if needed.
Thank you
The text was updated successfully, but these errors were encountered: