Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tacotron get slower using pytorch TRT #545

Open
terryyizhong opened this issue Jun 2, 2020 · 4 comments
Open

tacotron get slower using pytorch TRT #545

terryyizhong opened this issue Jun 2, 2020 · 4 comments

Comments

@terryyizhong
Copy link

Hi, I follow your guide, get the right docker env.
I run those script successfully, but the result I got is weird.

My Gpu is V100, I test three type of inference:
No amp, amp-run (fp16), and pytorch trt.
the total latency of this three are: 1.54, 1.3627, 1.34 (decrease as expect)
the waveglow latency decrease as expect: 0.46065, 0.3097, 0.1374
but the tacotron latency of trt get even longer! 1.08, 1.053, 1.208

the tacotron2 model I use is:
https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=3

Because you only provide the result of trt accerlerate on T4 GPU. I want to know is there anything wrong with my result? Why the trt make tacotron2 inference slower??

Thanks for your patience and Looking forward your reply.

@terryyizhong
Copy link
Author

terryyizhong commented Jun 2, 2020

one more question,
the result of inference performance of T4 GPU have big difference between:
https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2
and
https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/trt/README.md

I want to know what cause this difference, for the fp16 result.
@grzegorzkarchnv

@machineko
Copy link

You can see that results in main readme wasn't updated for a lot longer then results in trt repo.
And even so u can check https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp for even better results.
I've tested trt_cpp on both rtx 2080ti and t4 on fp16 run and in both cases results was a lot faster than pytorch (on t4 results was slower than presented in repo but probably because I've used AWS and also I've used different sequence length)

@ghost ghost self-assigned this Jun 3, 2020
@terryyizhong
Copy link
Author

I've tested trt_cpp on both rtx 2080ti and t4 on fp16 run and in both cases results was a lot faster than pytorch (on t4 results was slower than presented in repo but probably because I've used AWS and also I've used different sequence length)

Thanks for your reply. I test trt_cpp and get better inference speed now.
But I cannot test sequence length of 128 as the result showed in README.
The engine will generate 47.6s audio if I use the defualt text and setting in run_trtis_benchmark_client.sh. Do you counter this problem?

@machineko
Copy link

Nope I never used benchmark scripts, also I used my own trained models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants