Question about params in MTW configuration #25

DatGuy1 · 2020-11-22T14:55:47Z

I've read the paper and tried to understand the source code of the original repo, but can't for the life of me understand what to set these lines to. I trained a multispeaker model with 22050Hz and the rest hparams you'd expect for that sampling rate, but I did set the num_mels to 160 therefore I can't use the pretrained hifigan models.

Creating/training a new hifigan model works well enough in tensorboard logs. In tensorboard, for validation the spectrograms look good and the audio is fairly close to GT, but whenever I try to infer speech it results in indecipherable audio. What should I set those settings to to fix this?

CookiePPP · 2020-11-22T20:21:45Z

defaults should be good. Can you give me an end-to-end example where you try to infer speech? (e.g: audio files showing how it failed?)
If you're using inference.py then give me an example file?
If you're using tacotron2 then give me your tacotron2 hparams.py?

DatGuy1 · 2020-11-23T17:36:33Z

hparams.py
hifigan config.json
result of torch.save() on mel_batch_outputs_postnet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about params in MTW configuration #25

Question about params in MTW configuration #25

DatGuy1 commented Nov 22, 2020 •

edited

Loading

CookiePPP commented Nov 22, 2020 •

edited

Loading

DatGuy1 commented Nov 23, 2020

Question about params in MTW configuration #25

Question about params in MTW configuration #25

Comments

DatGuy1 commented Nov 22, 2020 • edited Loading

CookiePPP commented Nov 22, 2020 • edited Loading

DatGuy1 commented Nov 23, 2020

DatGuy1 commented Nov 22, 2020 •

edited

Loading

CookiePPP commented Nov 22, 2020 •

edited

Loading