-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No speed improvement between FP16 and INT8 TensorRT models #13433
Comments
👋 Hello @ingtommi, thank you for your interest in YOLOv5 🚀! It looks like you're encountering an issue with performance differences between FP16 and INT8 TensorRT models. Since this appears to be a 🐛 Bug Report, we would appreciate it if you could provide a minimum reproducible example (MRE) to assist us in debugging this issue. This could include specific commands you used, a small sample of your dataset, or any additional logs that might help clarify the problem. Please also double-check your environment to ensure compatibility:
For debugging, it might be helpful to test using different hardware or TensorRT versions to see if the issue persists. If this is related to specific YOLOv5 configurations, please share more details about your setup or the customizations you have made. An Ultralytics engineer will review this shortly and provide further assistance—thank you for your patience! 😊 |
Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs. If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful. Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques. |
YOLOv5 doesn't support INT8 TensorRT exports. |
Does the benchmark with trtexec show a difference? |
@Y-T-G no, you can check it yourself in the txt file I attached above. |
It's probably not a bug then |
@Y-T-G Yes, but I found no similar thing on the internet (no one comparing YOLOv5 fps fp16-int8) so I had to ask... |
Someone mentioned there was 10% improvement |
@Y-T-G yeah sorry, I also found that one (seems to be the only). 10% is better than my 0%, but he also sees little difference in memory while I move from 6.3 MB (fp16) to 4.7 MB (int8). |
Search before asking
YOLOv5 Component
Validation
Bug
When validating my YOLOv5n both in FP16 and INT8 precision I see no performance improvement for the INT8 version, while accuracy and model size drop (which is ok!). I then checked with trtexec and I again get the same latency:
yolov5n.txt.
Since this does not happens for latest YOLOs (where I see around 20% latency improvement), I was thinking that YOLOv5 does not have operations that benefit from INT8 on my current architecture (i.e. 16-bit is already fully optimized).
Can you help me understanding if this is true or I am making any mistake?
Environment
Minimal Reproducible Example
Additional
Model files: models.zip
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: