You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After separating STFT and ISTFT from the BSRoformer class, I was able to successfully export the model to ONNX, and trtexec could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.
Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?
After separating STFT and ISTFT from the
BSRoformer
class, I was able to successfully export the model to ONNX, andtrtexec
could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?
Nsight Result:
Modified source code:
https://github.com/bfloat16/Music-Source-Separation-Training
The text was updated successfully, but these errors were encountered: