About low accuracy on converted models #339

marcoslucianops · 2023-05-14T17:53:22Z

I evaluated the mAP between get_wts model and ONNX model and both faced accuracy drop on TensorRT conversion. The conclusion is that the TensorRT drops the accuracy when optimizing the layers.

YOLOv8n ONNX:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.343
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.492
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.373
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.471
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.295
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.488
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.700

YOLOv8n get_wts_yolov8.py

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.343
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.491
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.372
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.470
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.295
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.488
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699

The text was updated successfully, but these errors were encountered:

huytranvan2010 · 2023-05-19T10:01:23Z

@marcoslucianops could you please share code to evaluate .engine model?

marcoslucianops · 2023-05-19T11:51:04Z

@marcoslucianops could you please share code to evaluate .engine model?

I will share it in the future.

huytranvan2010 · 2023-05-19T17:32:40Z

@marcoslucianops could you please share code to evaluate .engine model?

Do I need to use file "libnvdsinfer_custom_impl_Yolo.so" generated from command "CUDA_VER=11.8 make -C nvdsinfer_custom_impl_Yolo" for evaluation or only use .engine model?

marcoslucianops · 2023-05-19T17:51:11Z

My eval code is created based on deepstream_python_apps with some custom implementations (image batch input, pycocotools, etc). It uses DeepStream to generate the JSON to be evaluated by pycocotools.

huytranvan2010 · 2023-05-21T08:14:56Z

My eval code is created based on deepstream_python_apps with some custom implementations (image batch input, pycocotools, etc). It uses DeepStream to generate the JSON to be evaluated by pycocotools.

I inference for each image in COCO val, collect labels to generate json file. But I got low mAP for yolov7 fp32 .engine model:
mAP0.5:0.95 = 0.4 mAP0.5 = 0.538 mAP0.75 = 0.435
It is too low compared to your benchmark, even if you use only yolov6 fp16 .engine model

marcoslucianops · 2023-05-21T13:09:26Z

In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes?

huytranvan2010 · 2023-05-21T13:13:40Z

In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes?

Yes. I run deepstream app for images and save output (labels) in a file by setting gie-kitti-output-dir. Then I collected labels and generated json files to evaluate. My mAP is too low.

marcoslucianops · 2023-05-21T13:25:44Z

In the kitti output, the bboxes coordinates are related to the streammux resolution you set. You need to change them according to each validation image resolution.

huytranvan2010 · 2023-05-21T13:29:28Z

In the kitti output, the bboxes coordinates are related to the streammux resolution you set. You need to change them according to each validation image resolution.

Yes, I recognized that, and also changed to image size, but mAP is too low.

marcoslucianops · 2023-05-21T13:58:55Z

Did you set

[class-attrs-all]
nms-iou-threshold=0.65
pre-cluster-threshold=0.001
topk=300

In the config_infer_primary_yoloV7.txt file?

huytranvan2010 · 2023-05-21T14:03:21Z

Did you set
[class-attrs-all]
nms-iou-threshold=0.65
pre-cluster-threshold=0.001
topk=300
In the config_infer_primary_yoloV7.txt file?

Did you use the above config to receive benchmark? I used default set up.

nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

marcoslucianops · 2023-05-21T14:15:58Z

The evaluation uses different NMS and confidence thresholds. Try with the values I sent.

huytranvan2010 · 2023-05-21T14:26:51Z

The evaluation uses different NMS and confidence thresholds. Try with the values I sent.

Thanks a lot for supporting me. I am going to try it now😍

huytranvan2010 · 2023-05-21T16:44:39Z

Did you set
[class-attrs-all]
nms-iou-threshold=0.65
pre-cluster-threshold=0.001
topk=300
In the config_infer_primary_yoloV7.txt file?

I used this set up, mAP is better, but it is still lower than your benchmark for YOLOv7. Here is my result for fp32 .engine model

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.623
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.485

I attached my config ( I have many config as final_config_1.txt)
config_infer_primary_yoloV7.txt
final_config_1.txt

marcoslucianops · 2023-05-22T21:08:09Z

My eval code is fine-adjusted for extract the better mAP using DeepStream, that's why I got a bit more mAP.

huytranvan2010 · 2023-05-30T08:53:15Z

In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes?

@marcoslucianops Do You mean Yolov7 model? I saw that your fp16 .engine model has mAP0.5:0.95 = 0.476, it means that mAP0.5:0.95 (of fp32 .engine model) = 0.476. It is too low compared with reference .pt model mAP0.5:0.95 = 0.514 https://github.com/WongKinYiu/yolov7#performance

marcoslucianops · 2023-05-31T15:18:41Z

There's a drop on TensorRT compared to the PyTorch model. In some models, it's a relevant drop. In other models (like PPYOLOE and YOLO-NAS), it's a small. The test I did I was comparing the ONNX export method with the wts and cfg export method. There's no drop between those two export methods.

huytranvan2010 · 2023-05-31T15:34:59Z

There's a drop on TensorRT compared to the PyTorch model. In some models, it's a relevant drop. In other models (like PPYOLOE and YOLO-NAS), it's a small. The test I did I was comparing the ONNX export method with the wts and cfg export method. There's no drop between those two export methods.

Thanks a lot. I expect fp32 is not drop mAP much. If mAP of fp32 or fp16 drop much, so mAP of int8 is still lower.

marcoslucianops · 2023-05-31T15:44:53Z

The FP16 and FP32 mAP are equal.

huytranvan2010 · 2023-05-31T15:52:16Z

The FP16 and FP32 mAP are equal.

Yeah, I think so. In your opinion, what is the reason of fp32, fp16's mAP big drop compared with .pt models? I mean some models included yolov7. I saw that yolov7 fp16 is dropped about 4%.

marcoslucianops · 2023-05-31T16:18:45Z

In my opinion, TensorRT layers are performance focused, making some tweaks to precisions and parameters. So it's faster, but loses some of the accuracy.

huytranvan2010 · 2023-05-31T16:36:30Z

In my opinion, TensorRT layers are performance focused, making some tweaks to precisions and parameters. So it's faster, but loses some of the accuracy.

Thanks for sharing.

cgrtrifork · 2023-10-01T21:20:45Z

Could this be related to inputs being different, not only TensorRT tweaks? For instance, in YOLOv8 it looks like symmetric padding is done with a grayscale value rather than with black color like DeepStream's nvstreammux does.

Edit: I also saw the following warning when running with exported ONNX models. Could this be another reason for the drop in performance? Is it possible to export using INT32 instead of INT64?

WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped

In any case, it would be good to have a table of the expected drop for each of the models, as a reference.

WangFengtu1996 · 2024-01-18T02:02:34Z

@cgrtrifork anything update? I have same warning that Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. How to slove the problem.

WangFengtu1996 · 2024-01-18T02:23:19Z

when I inference yolov8s in Deepstream-6.3 in nvidia agx orin DK, I have some question.
fp32 gpu ~30fps
fp16 gpu+dla0 ~11fps and it's a relevant drop.

would someone give me some explain and guide ?

cgrtrifork · 2024-02-09T16:12:53Z

I ran the following experiment: I am trying out YOLOv8 object detection on an image that contains an object.

I used this repository to export the model to onnx. Then using ffmpeg I generated a single-frame video, that I feed into DeepStream with a confidence threshold pre-cluster-threshold=0.2.

When I use NMS clustering (cluster-mode=2, nms-iou-threshold=0.5) the object is not found.
If I disable the clustering (cluster-mode=4) then an object is found with confidence 0.77.

I used Triton Inference Server to serve the same TensorRT model that is generated when running DeepStream. Then I ran the inference on the same image. I preprocessed the image to get a 3x640x640 image of float32 between 0 and 1 in RGB format, as it is expected by the model.

When I use gray background for the padding (pixel value = 114/255) —like YOLO does— the max score of the output is 0.87.
When I use black background for the padding (pixel value = 0)—like DeepStream does— the max score of the output is 0.90.

Having used the same TensorRT model, this makes me think there is an issue either on the parsing and interpretation of the output from the model, or deeper in a lower level DeepStream preprocessing of the image.

Why does enabling the NMS remove the detection? If the detection is the maximum score the NMS shouldn't remove it.
Why are the scores different between DeepStream's nvinfer plugin and Triton Inference Server?

For completeness:

The image I'm using was originally extracted from a video by doing:

# extract all the frames from the original video into a folder
# frames are enumerated starting from 1
ffmpeg -i original_video.mp4 original_video/%05d.jpg

Then I chose the frame to use (number 84), and I created the single-frame video by doing:

# frames start from 0, that's why we choose 84-1=83
ffmpeg -i original_video.mp4 -vf "select=eq(n\,83)" single_frame_video.mp4

The pipeline I'm using in DeepStream is: nvurisrcbin -> videorate -> nvvideoconvert -> capsfilter -> nvstreammux -> queue -> nvvideoconvert -> capsfilter -> nvinfer -> fakesink. I'm adding a probe after the nvinfer to see the detections.

@marcoslucianops have you tried evaluating the engine file outside of DeepStream?

cgrtrifork · 2024-02-12T15:31:09Z

I ran the following experiment: I am trying out YOLOv8 object detection on an image that contains an object.

1. I used this repository to export the model to onnx. Then using `ffmpeg` I generated a single-frame video, that I feed into DeepStream with a confidence threshold `pre-cluster-threshold=0.2`.


* When I use NMS clustering (`cluster-mode=2`, `nms-iou-threshold=0.5`) the object is _not_ found.

* If I disable the clustering (`cluster-mode=4`) then an object is found with confidence 0.77.


2. I used Triton Inference Server to serve the same TensorRT model that is generated when running DeepStream. Then I ran the inference on the same image. I preprocessed the image to get a 3x640x640 image of float32 between 0 and 1 in RGB format, as it is expected by the model.


* When I use gray background for the padding (pixel value = 114/255) —like YOLO does— the max score of the output is 0.87.

* When I use black background for the padding (pixel value = 0)—like DeepStream does— the max score of the output is 0.90.

Having used the same TensorRT model, this makes me think there is an issue either on the parsing and interpretation of the output from the model, or deeper in a lower level DeepStream preprocessing of the image.

1. Why does enabling the NMS remove the detection? If the detection is the maximum score the NMS shouldn't remove it.

2. Why are the scores different between DeepStream's `nvinfer` plugin and Triton Inference Server?

For completeness:

* The image I'm using was originally extracted from a video by doing:

# extract all the frames from the original video into a folder
# frames are enumerated starting from 1
ffmpeg -i original_video.mp4 original_video/%05d.jpg

Then I chose the frame to use (number 84), and I created the single-frame video by doing:

# frames start from 0, that's why we choose 84-1=83
ffmpeg -i original_video.mp4 -vf "select=eq(n\,83)" single_frame_video.mp4

* The pipeline I'm using in DeepStream is: `nvurisrcbin` -> `videorate` -> `nvvideoconvert` -> `capsfilter` -> `nvstreammux` -> `queue` -> `nvvideoconvert`  -> `capsfilter` -> `nvinfer` -> `fakesink`. I'm adding a probe after the `nvinfer` to see the detections.

@marcoslucianops have you tried evaluating the engine file outside of DeepStream?

Following up on this I found out that the parsing from NvDsInferParseYolo seems to be correct for this case. However, the resulting detection from DeepStream is not the one with the highest confidence. Here you can see the logs from DeepStream —I added print statements to the library:

[Class 0] Box proposal with confidence 0.750208: x1=185.988, y1=141.614, x2=499.038, y2=417.46 (threshold: 0.2)
[Class 0] BBI with confidence 0.750208: left=185.988, top=141.614, width=313.05, height=275.846
[Class 0] Box proposal with confidence 0.881455: x1=184.819, y1=141.771, x2=497.486, y2=416.067 (threshold: 0.2)
[Class 0] BBI with confidence 0.881455: left=184.819, top=141.771, width=312.667, height=274.296
[Class 0] Box proposal with confidence 0.886627: x1=185.479, y1=141.421, x2=499.159, y2=415.547 (threshold: 0.2)
[Class 0] BBI with confidence 0.886627: left=185.479, top=141.421, width=313.68, height=274.127
[Class 0] Box proposal with confidence 0.877862: x1=185.409, y1=141.396, x2=499.173, y2=415.735 (threshold: 0.2)
[Class 0] BBI with confidence 0.877862: left=185.409, top=141.396, width=313.764, height=274.339
[Class 0] Box proposal with confidence 0.866284: x1=184.766, y1=141.94, x2=497.723, y2=416.012 (threshold: 0.2)
[Class 0] BBI with confidence 0.866284: left=184.766, top=141.94, width=312.958, height=274.072
[Class 0] Box proposal with confidence 0.854519: x1=184.601, y1=141.577, x2=499.699, y2=415.742 (threshold: 0.2)
[Class 0] BBI with confidence 0.854519: left=184.601, top=141.577, width=315.097, height=274.165
[Class 0] Box proposal with confidence 0.856617: x1=185.726, y1=141.448, x2=499.246, y2=415.667 (threshold: 0.2)
[Class 0] BBI with confidence 0.856617: left=185.726, top=141.448, width=313.52, height=274.219
[Class 0] Box proposal with confidence 0.770557: x1=184.458, y1=142.046, x2=498.037, y2=416.368 (threshold: 0.2)
[Class 0] BBI with confidence 0.770557: left=184.458, top=142.046, width=313.579, height=274.322
[Class 0] Box proposal with confidence 0.752778: x1=184.416, y1=141.868, x2=499.955, y2=416.512 (threshold: 0.2)
[Class 0] BBI with confidence 0.752778: left=184.416, top=141.868, width=315.539, height=274.644
[Class 0] Box proposal with confidence 0.725658: x1=185.231, y1=141.948, x2=499.762, y2=416.444 (threshold: 0.2)
[Class 0] BBI with confidence 0.725658: left=185.231, top=141.948, width=314.531, height=274.496
[Class 0] Box proposal with confidence 0.23098: x1=184.02, y1=141.643, x2=500.4, y2=416.408 (threshold: 0.2)
[Class 0] BBI with confidence 0.23098: left=184.02, top=141.643, width=316.38, height=274.764
Objects decoded: 11
ObjectList after assignment: 11
2024-02-12 14:21:33,705 [INFO][root]     Frame number: 0
2024-02-12 14:21:33,705 [INFO][root]     [Class 0] Found object with confidence = 0.7705574035644531: left=332.0246276855469, top=0.0827464759349823, width=564.4418334960938, height=494.5528259277344

The DeepStream version I'm using is 6.2, I will test this in newer versions too.

EDIT: It seems to be fixed when upgrading to DeepStream 6.3, now all the detections are found if NMS is disabled, and only the correct maximum confidence detection is found when using NMS.

marcoslucianops pinned this issue May 14, 2023

This was referenced May 19, 2023

yolov5 and deepstream accuracy problem #237

Closed

YOLOV5 model has very weak detection capability after deployment #251

Closed

No detection with int8 #336

Closed

This was referenced Oct 3, 2023

DeepStream converts and runs YOLOV5 model, but the scores of detected targets is significantly lost #463

Closed

Bigger drop in Detection Accuracy while running yolov7 standalone and in Deep Stream #466

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About low accuracy on converted models #339

About low accuracy on converted models #339

marcoslucianops commented May 14, 2023

huytranvan2010 commented May 19, 2023

marcoslucianops commented May 19, 2023

huytranvan2010 commented May 19, 2023 •

edited

Loading

marcoslucianops commented May 19, 2023

huytranvan2010 commented May 21, 2023

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023 •

edited

Loading

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023

huytranvan2010 commented May 21, 2023 •

edited

Loading

marcoslucianops commented May 22, 2023

huytranvan2010 commented May 30, 2023 •

edited

Loading

marcoslucianops commented May 31, 2023

huytranvan2010 commented May 31, 2023

marcoslucianops commented May 31, 2023

huytranvan2010 commented May 31, 2023

marcoslucianops commented May 31, 2023

huytranvan2010 commented May 31, 2023

cgrtrifork commented Oct 1, 2023 •

edited

Loading

WangFengtu1996 commented Jan 18, 2024

WangFengtu1996 commented Jan 18, 2024

cgrtrifork commented Feb 9, 2024

cgrtrifork commented Feb 12, 2024 •

edited

Loading

About low accuracy on converted models #339

About low accuracy on converted models #339

Comments

marcoslucianops commented May 14, 2023

huytranvan2010 commented May 19, 2023

marcoslucianops commented May 19, 2023

huytranvan2010 commented May 19, 2023 • edited Loading

marcoslucianops commented May 19, 2023

huytranvan2010 commented May 21, 2023

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023 • edited Loading

marcoslucianops commented May 21, 2023

huytranvan2010 commented May 21, 2023

huytranvan2010 commented May 21, 2023 • edited Loading

marcoslucianops commented May 22, 2023

huytranvan2010 commented May 30, 2023 • edited Loading

marcoslucianops commented May 31, 2023

huytranvan2010 commented May 31, 2023

marcoslucianops commented May 31, 2023

huytranvan2010 commented May 31, 2023

marcoslucianops commented May 31, 2023

huytranvan2010 commented May 31, 2023

cgrtrifork commented Oct 1, 2023 • edited Loading

WangFengtu1996 commented Jan 18, 2024

WangFengtu1996 commented Jan 18, 2024

cgrtrifork commented Feb 9, 2024

cgrtrifork commented Feb 12, 2024 • edited Loading

huytranvan2010 commented May 19, 2023 •

edited

Loading

huytranvan2010 commented May 21, 2023 •

edited

Loading

huytranvan2010 commented May 21, 2023 •

edited

Loading

huytranvan2010 commented May 30, 2023 •

edited

Loading

cgrtrifork commented Oct 1, 2023 •

edited

Loading

cgrtrifork commented Feb 12, 2024 •

edited

Loading