Issues Related to TensorRT Accelerated Inference #37

bfloat16 · 2024-08-09T17:29:01Z

After separating STFT and ISTFT from the BSRoformer class, I was able to successfully export the model to ONNX, and trtexec could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.

Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?

Nsight Result:

Modified source code:
https://github.com/bfloat16/Music-Source-Separation-Training

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues Related to TensorRT Accelerated Inference #37

Issues Related to TensorRT Accelerated Inference #37

bfloat16 commented Aug 9, 2024

Issues Related to TensorRT Accelerated Inference #37

Issues Related to TensorRT Accelerated Inference #37

Comments

bfloat16 commented Aug 9, 2024