Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CO-DETR #594

Open
edwardnguyen1705 opened this issue Nov 26, 2024 · 5 comments
Open

Support CO-DETR #594

edwardnguyen1705 opened this issue Nov 26, 2024 · 5 comments

Comments

@edwardnguyen1705
Copy link

edwardnguyen1705 commented Nov 26, 2024

Dear @marcoslucianops ,

Thanks so much for sharing your great work. I have been using your repo to obtain TRT models that can be used in DeepStream.

Could you consider supporting this model: https://github.com/open-mmlab/mmdetection/tree/main/projects/CO-DETR (https://github.com/Sense-X/Co-DETR)?

Here is a repo that Co-DERT is converted to TRT: https://github.com/DataXujing/Co-DETR-TensorRT

I really appreciate your time.

@marcoslucianops
Copy link
Owner

Added support https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/CODETR.md

@edwardnguyen1705
Copy link
Author

Added support https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/CODETR.md

Dear @marcoslucianops ,

Thank you so much!

I will try soon.

@edwardnguyen1705
Copy link
Author

edwardnguyen1705 commented Dec 5, 2024

Added support https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/CODETR.md

Dear @marcoslucianops ,

I am trying to convert this model: https://download.openmmlab.com/mmdetection/v3.0/codetr/co_dino_5scale_swin_large_16e_o365tococo-614254c9.pth in this repo
https://github.com/open-mmlab/mmdetection/tree/main/projects/CO-DETR

I have successfully convert from PyTorch to ONNX, but failed converting from ONNX to TRT.

Modification made to export_codetr.py

#  args.opset=16 must be used
print('Exporting the model to ONNX')
    with torch.no_grad():
        torch.onnx.export(
            model, onnx_input_im, onnx_output_file, verbose=True, opset_version=args.opset, do_constant_folding=True,
            input_names=['input'], output_names=['output'], dynamic_axes=dynamic_axes if args.dynamic else None,
        )

There is no quantization option, just FP32. Have you ever faced the following err?

[12/05/2024-08:18:20] [TRT] [V] =============== Computing costs for {ForeignNode[/0/transformer/Flatten.../0/transformer/decoder/Slice]}
[12/05/2024-08:18:20] [TRT] [V] *************** Autotuning format combination: Bool(163840,512,1), Bool(40960,256,1), Bool(10240,128,1), Bool(2560,64,1), Bool(640,32,1), Float(41943040,1310720,1,1), Float(10485760,327680,1,1), Float(2621440,81920,1,1), Float(655360,20480,1,1), Float(163840,5120,1,1), Float(10485760,32768,64,1), Float(10485760,32768,64,1), Float(10485760,32768,64,1), Float(10485760,32768,64,1), Float(2621440,16384,64,1), Float(2621440,16384,64,1), Float(2621440,16384,64,1), Float(2621440,16384,64,1), Float(655360,8192,64,1), Float(655360,8192,64,1), Float(655360,8192,64,1), Float(655360,8192,64,1), Float(163840,4096,64,1), Float(163840,4096,64,1), Float(163840,4096,64,1), Float(163840,4096,64,1), Float(40960,2048,64,1), Float(40960,2048,64,1), Float(40960,2048,64,1), Float(40960,2048,64,1) -> Bool(218240,1,1), Float(20,20,4,1), Float(55869440,256,1), Float(3600,4,1), Float(18000,20,4,1), Float(57600,64,1), Float(57600,64,1), Float(57600,64,1), Float(57600,64,1), Float(57600,64,1), Float(57600,64,1), Float(57600,64,1), Float(57600,64,1) ***************
[12/05/2024-08:18:20] [TRT] [V] --------------- Timing Runner: {ForeignNode[/0/transformer/Flatten.../0/transformer/decoder/Slice]} (Myelin[0x80000023])
[12/05/2024-08:18:50] [TRT] [V] Skipping tactic 0 due to insufficient memory on requested size of 11938766208 detected for tactic 0x0000000000000000.
[12/05/2024-08:18:50] [TRT] [V] {ForeignNode[/0/transformer/Flatten.../0/transformer/decoder/Slice]} (Myelin[0x80000023]) profiling completed in 29.4449 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[12/05/2024-08:18:50] [TRT] [W] No valid obedient candidate choices for node {ForeignNode[/0/transformer/Flatten.../0/transformer/decoder/Slice]} that meet the preferred precision. The remaining candidate choices will be profiled.
[12/05/2024-08:18:50] [TRT] [V] Deleting timing cache: 142 entries, served 89 hits since creation.
[12/05/2024-08:18:50] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[/0/transformer/Flatten.../0/transformer/decoder/Slice]}.
[12/05/2024-08:18:50] [TRT] [E] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/0/transformer/Flatten.../0/transformer/decoder/Slice]}.)
Traceback (most recent call last):
  File "/models/onnx2trt.py", line 88, in <module>
    sys.exit(build_engine(args) or 0)
  File "/models/onnx2trt.py", line 66, in build_engine
    f.write(engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

@marcoslucianops
Copy link
Owner

Please follow the steps on the doc. You should use the utils/export_codetr.py to generate the ONNX file and then use DeepStream-Yolo (with config_infer_primary_codetr.txt and deepstream_app_config.txt) to generate the TRT engine.

@edwardnguyen1705
Copy link
Author

edwardnguyen1705 commented Dec 12, 2024

Please follow the steps on the doc. You should use the utils/export_codetr.py to generate the ONNX file and then use DeepStream-Yolo (with config_infer_primary_codetr.txt and deepstream_app_config.txt) to generate the TRT engine.

Thank you, I have successfully converted and tested the TRT engine by following your guideline. However, the speed is not improved comparing to that of PyTorch model, do you know any clue that leads to slow FPS of the generated TRT engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants