You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I observed significant difference of GEMM output between ONNX(opset 18 + ort 1.18.0 + CPU) and TRT(10.0.1) results.
This only happens if the batch size of image_embeddings == 1 and trt ver >= 10.
Either (on trt ver == 8.6.3 and bs == any) or (on trt ver >= 10 and bs > 1), this won't happen.
I found the TensorRT release note of 10.1.0 mentioned a know issue: "There is a known accuracy issue when the network contains two consecutive GEMV operations (that is, MatrixMultiply with gemmM or gemmN == 1). To workaround this issue, try padding the MatrixMultiply input to have dimensions greater than 1."
So I guess the fusion strategy is a bit different among:
trt ver == 8.6.3, bs == any (acceptable diff.)
trt ver == 10.0.1, bs == 1 (significant diff.)
trt ver == 10.0.1, bs > 1 (acceptable diff.)
I used trex to do some visualization on each converted engine:
It seems that myelin compiler applies different optimizations for the aforementioned situations:
trt ver == 8.6.3, bs == any: unseen fusion myelin node
trt ver == 10.0.1, bs == 1: fuse two consecutive GEMM into one kgen node
trt ver == 10.0.1, bs > 1: separated kgen node for each GEMM
My question is:
What's the recommended trt version to avoid this issue temporarily? 8.6.3 or higher(I'm not sure why ngc docker image bump 8.6.3 directly to 10.0.1)?
When this issue will be fixed in the trt major version of 10?
Maybe irrelative, what's the difference between kgen node and myelin node?
Thanks for your reply!
So the ver.8.6.3 is the latest stable ver. for vision model before major ver. 10.
Do you know which pull request is coping with the GEMM error issue of ver. 10?
Hi, @lix19937. I tried polygraphy run decoder.onnx --trt --onnxrt --input-shapes image_embeddings:[1,256,64,64] w/ or w/o --builder-optimization-level 5. The difference did not change and still significant on 10.0.1
Description
I observed significant difference of GEMM output between ONNX(opset 18 + ort 1.18.0 + CPU) and TRT(10.0.1) results.
![image](https://private-user-images.githubusercontent.com/52701013/343077795-df0a996b-f417-4f6c-9bf5-2b472433f1b0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MjkxMTIsIm5iZiI6MTcyMTgyODgxMiwicGF0aCI6Ii81MjcwMTAxMy8zNDMwNzc3OTUtZGYwYTk5NmItZjQxNy00ZjZjLTliZjUtMmI0NzI0MzNmMWIwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI0VDEzNDY1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWEwMjAzZGQ0MGQ1NjZjZDUzYzA5NjQwMTIzYjM5Y2I4MzRmOGUyYmE5YTQ1ZWRhZmU2NzVhMzc1YjhjMDZiZTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.6SsoRPp5pOA63GgKnNG0hhIY9vk0sDnk9f5IvZ-ZFp0)
This only happens if the batch size of image_embeddings == 1 and trt ver >= 10.
Either (on trt ver == 8.6.3 and bs == any) or (on trt ver >= 10 and bs > 1), this won't happen.
![image](https://private-user-images.githubusercontent.com/52701013/343078799-0fab99dd-d183-4b4b-8c23-0d4ba8cdf03f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MjkxMTIsIm5iZiI6MTcyMTgyODgxMiwicGF0aCI6Ii81MjcwMTAxMy8zNDMwNzg3OTktMGZhYjk5ZGQtZDE4My00YjRiLThjMjMtMGQ0YmE4Y2RmMDNmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI0VDEzNDY1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJhNTA5MDZjMWVhZDQwMjI2NGQzNTJmMmU1MjhkNWMzZTlmMDVlOWZlZjNhNjQ3MWM1NWYzYzU1NWI1MTBiMzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.45yM6neokIeygu-jVE0kLXRpQgdyp-_4t7KdZ2iMVKI)
I found the TensorRT release note of 10.1.0 mentioned a know issue: "There is a known accuracy issue when the network contains two consecutive GEMV operations (that is, MatrixMultiply with gemmM or gemmN == 1). To workaround this issue, try padding the MatrixMultiply input to have dimensions greater than 1."
So I guess the fusion strategy is a bit different among:
I used
![image](https://private-user-images.githubusercontent.com/52701013/343081416-c2c03db4-bc88-47ea-b623-301332078a04.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MjkxMTIsIm5iZiI6MTcyMTgyODgxMiwicGF0aCI6Ii81MjcwMTAxMy8zNDMwODE0MTYtYzJjMDNkYjQtYmM4OC00N2VhLWI2MjMtMzAxMzMyMDc4YTA0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI0VDEzNDY1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNkNmY2OTBmZTA5OWE0ODEyNzE3ZWM4NDZkNzhiZTJlZDJmZTgzMjY4NjAwZDQ3OWY5YjEwYjEwYzA0NDIyMjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.rDARcUZroZxaLz98xXdLAR23XroxZNbIHXqqCoClGkE)
trex
to do some visualization on each converted engine:It seems that myelin compiler applies different optimizations for the aforementioned situations:
myelin
nodekgen
nodekgen
node for each GEMMMy question is:
kgen
node andmyelin
node?Thanks in advance.
Environment
PyTorch docker image 24.05 from NGC
TensorRT Version: TensorRT 10.0.1.6
NVIDIA GPU: NVIDIA GeForce RTX 3090
NVIDIA Driver Version: 555.42.02
CUDA Version: 12.4.1
CUDNN Version: 9.1.0.70
Operating System: Ubuntu 22.04.4 LTS
Python Version: 3.10.12
Tensorflow Version: N/A
PyTorch Version: 2.4.0a0+07cecf4168.nv24.05
Baremetal or Container: nvcr.io/nvidia/pytorch:24.05-py3
Relevant Files
Model link:
I think you can reproduce the issue based on any SAM decoder.
The exported ONNX from here may work: SAM ONNX from AnyLabeling
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
![image](https://private-user-images.githubusercontent.com/52701013/343063916-6e021065-85d8-445c-b38a-0933ac52c07a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MjkxMTIsIm5iZiI6MTcyMTgyODgxMiwicGF0aCI6Ii81MjcwMTAxMy8zNDMwNjM5MTYtNmUwMjEwNjUtODVkOC00NDVjLWIzOGEtMDkzM2FjNTJjMDdhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI0VDEzNDY1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWMzYWNmZWQ5NDUxYzgzZDRmN2IyYWJkZDc4MzE0OGNjZjgwOGYxYjY3NzExMDM4Y2Y3YWY0YjE2ZGU1NTIwNjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.QjCqTVE5HV1KHxCTJgGcrelUBEolKfIuN9sHX_zmXxk)
No, as this is mentioned as know issue in the latest release:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
): YesThe text was updated successfully, but these errors were encountered: