-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to enable FP8 convolution in TensorRT 10.2 #3987
Comments
How was this file(simple_conv_fp8.onnx) generated ? |
@junstar92 You might have to add the |
Sorry, this is a bug in TRT 10.2. Please enable We will try to fix this issue in TRT 10.3 |
@nvpohanh Thank you for checking this issue. But, I have another question about FP8 convolution. Here is the error log.
|
Did you insert the Q/DQ ops by using the TensorRT Model Optimizer toolkit? https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq It should have avoided the Q/DQ ops before Convs whose C and K are not multiples of 16. |
But thanks for pointing this out. I will add this limitation to our release notes. |
@nvpohanh Thanks for quick answer. |
Filed an internal tracker: id 4744383 We will debug this and find out how this is different from our FP8 ResNet50 testing in our CI/CD. |
This is my quantization and onnx-export code. import torch
import torchvision
import modelopt.torch.quantization as mtq
FP8_DEFAULT_CFG = {
"quant_cfg": {
"*weight_quantizer": {"num_bits": (4, 3), "axis": None},
"*input_quantizer": {"num_bits": (4, 3), "axis": None},
"*output_quantizer": {"enable": False},
"*block_sparse_moe.gate*": {"enable": False}, # Skip the MOE router
"default": {"num_bits": (4, 3), "axis": None},
},
"algorithm": "max",
}
def calib_loop():
for _ in range(10):
model(torch.randn(16, 3, 224, 224, device='cuda'))
model = torchvision.models.resnet18(pretrained=True).cuda()
mtq.quantize(model, FP8_DEFAULT_CFG, forward_loop=calib_loop)
torch.onnx.export(
model,
torch.randn(16, 3, 224, 224, device='cuda'),
'resnet18_fp8.onnx',
input_names=['input'],
output_names=['output'],
) |
@junstar92 Oh I see, you are using I will check internally about |
@nvpohanh Okay, it is the quantized onnx by using It succeeded to build this onnx model. It seems right that the first conv op is not implemented. |
@nvpohanh My question has been resolved and I close this issue. |
Hello,
I am using TensorRT 10.2 and noticed that the normal FP8 convolution has been updated.
However, when I try to use a simple QDQ + Conv model in ONNX, the FP8 convolution is not selected. Even timing FP8 tactics is not performed.
Here is the model I used. It was quantized by using TensorRT-Model-Optimizer. And I used H100 device.
trtexec
command:The text was updated successfully, but these errors were encountered: