We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Related to Model/Framework(s) PyTorch/Tacotron2/WaveGlow
Describe the bug
DLL 2020-10-30 22:47:30.112884 - (38, 87) train_iter_time : 0.7559060430066893 DLL 2020-10-30 22:47:30.114633 - (38, 88) glob_iter/iters_per_epoch : 11410/306 DLL 2020-10-30 22:47:30.377361 - (38, 88) train_loss : -3.687452554702759 Traceback (most recent call last): File "train.py", line 555, in <module> main() File "train.py", line 500, in main scaled_loss.backward() File "/home/ioannis/anaconda3/envs/waveglow/lib/python3.7/contextlib.py", line 119, in __exit__ next(self.gen) File "/home/ioannis/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss optimizer._post_amp_backward(loss_scaler) File "/home/ioannis/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights post_backward_models_are_masters(scaler, params, stashed_grads) File "/home/ioannis/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters scale_override=(grads_have_scale, stashed_have_scale, out_scale)) File "/home/ioannis/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/scaler.py", line 176, in unscale_with_stashed out_scale/grads_have_scale, # 1./scale, ZeroDivisionError: float division by zero
To Reproduce Steps to reproduce the behavior:
Install SpeechSynthesis/Tacotron2 form requirements.txt + PyTorch 1.6
Download training data from here and add to home Tacotron2 dir https://s3.amazonaws.com/skinnybottle.com/downloads/tacotron-data.rar
Run and wait a few dozen epochs python train.py -d wavs --model-name WaveGlow --training-files metadata-training-files.csv --validation-files metadata-validation-files.csv -o trumpbot-output-amp --epochs 1001 --learning-rate 1e-4 --batch-size 4 --cudnn-enabled --epochs-per-checkpoint 10 --resume-from-last --amp
Expected behavior Should train all the way to epoch 1001
Environment Please provide at least:
I should note that when I don't run into this error, I run into #694 . Not sure if they are related.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Related to Model/Framework(s)
PyTorch/Tacotron2/WaveGlow
Describe the bug
To Reproduce
Steps to reproduce the behavior:
Install SpeechSynthesis/Tacotron2 form requirements.txt + PyTorch 1.6
Download training data from here and add to home Tacotron2 dir
https://s3.amazonaws.com/skinnybottle.com/downloads/tacotron-data.rar
Run and wait a few dozen epochs
python train.py -d wavs --model-name WaveGlow --training-files metadata-training-files.csv --validation-files metadata-validation-files.csv -o trumpbot-output-amp --epochs 1001 --learning-rate 1e-4 --batch-size 4 --cudnn-enabled --epochs-per-checkpoint 10 --resume-from-last --amp
Expected behavior
Should train all the way to epoch 1001
Environment
Please provide at least:
I should note that when I don't run into this error, I run into #694 . Not sure if they are related.
The text was updated successfully, but these errors were encountered: