tacotron get slower using pytorch TRT #545

terryyizhong · 2020-06-02T03:56:00Z

Hi, I follow your guide, get the right docker env.
I run those script successfully, but the result I got is weird.

My Gpu is V100, I test three type of inference:
No amp, amp-run (fp16), and pytorch trt.
the total latency of this three are: 1.54, 1.3627, 1.34 (decrease as expect)
the waveglow latency decrease as expect: 0.46065, 0.3097, 0.1374
but the tacotron latency of trt get even longer! 1.08, 1.053, 1.208

the tacotron2 model I use is:
https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=3

Because you only provide the result of trt accerlerate on T4 GPU. I want to know is there anything wrong with my result? Why the trt make tacotron2 inference slower??

Thanks for your patience and Looking forward your reply.

terryyizhong · 2020-06-02T06:33:58Z

one more question,
the result of inference performance of T4 GPU have big difference between:
https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2
and
https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/trt/README.md

I want to know what cause this difference, for the fp16 result.
@grzegorzkarchnv

machineko · 2020-06-02T11:45:55Z

You can see that results in main readme wasn't updated for a lot longer then results in trt repo.
And even so u can check https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp for even better results.
I've tested trt_cpp on both rtx 2080ti and t4 on fp16 run and in both cases results was a lot faster than pytorch (on t4 results was slower than presented in repo but probably because I've used AWS and also I've used different sequence length)

terryyizhong · 2020-06-04T07:24:37Z

I've tested trt_cpp on both rtx 2080ti and t4 on fp16 run and in both cases results was a lot faster than pytorch (on t4 results was slower than presented in repo but probably because I've used AWS and also I've used different sequence length)

Thanks for your reply. I test trt_cpp and get better inference speed now.
But I cannot test sequence length of 128 as the result showed in README.
The engine will generate 47.6s audio if I use the defualt text and setting in run_trtis_benchmark_client.sh. Do you counter this problem?

machineko · 2020-06-05T10:22:06Z

Nope I never used benchmark scripts, also I used my own trained models.

ghost self-assigned this Jun 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tacotron get slower using pytorch TRT #545

tacotron get slower using pytorch TRT #545

terryyizhong commented Jun 2, 2020

terryyizhong commented Jun 2, 2020 •

edited

Loading

machineko commented Jun 2, 2020

terryyizhong commented Jun 4, 2020

machineko commented Jun 5, 2020

tacotron get slower using pytorch TRT #545

tacotron get slower using pytorch TRT #545

Comments

terryyizhong commented Jun 2, 2020

terryyizhong commented Jun 2, 2020 • edited Loading

machineko commented Jun 2, 2020

terryyizhong commented Jun 4, 2020

machineko commented Jun 5, 2020

terryyizhong commented Jun 2, 2020 •

edited

Loading