forked from ultralytics/ultralytics
-
Notifications
You must be signed in to change notification settings - Fork 39
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from zen-xingle/main
support export rknn optimized type torchscript model
- Loading branch information
Showing
9 changed files
with
158 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
## Description - export optimized model for RKNPU | ||
|
||
### 1. Model structure Adjustment | ||
|
||
- The dfl structure has poor performance on NPU processing, moved outside the model. | ||
|
||
Assuming that there are 6000 candidate frames, the original model places the dfl structure before the "box confidence filter", then the 6000 candidate frames need to be calculated through dfl calculation. If the dfl structure is placed after the "box confidence filter", Assuming that there are 100 candidate boxes left after filtering, the calculation amount of the dfl part is reduced to 100, which greatly reduces the occupancy of computing resources and bandwidth resources. | ||
|
||
|
||
|
||
- Assuming that there are 6000 candidate boxes and the detection category is 80, the threshold retrieval operation needs to be repeated 6000* 80 ~= 4.8*10^5 times, which takes a lot of time. Therefore, when exporting the model, an additional summation operation for 80 types of detection targets is added to the model to quickly filter the confidence. (This structure is effective in some cases, related to the training results of the model) | ||
|
||
You can comment out this part of the optimization at line 52 to line 54 of **ultralytics/nn/modules/head.py**, and the corresponding code is: | ||
|
||
``` | ||
cls_sum = torch.clamp(y[-1].sum(1, keepdim=True), 0, 1) | ||
y.append(cls_sum) | ||
``` | ||
|
||
|
||
|
||
|
||
- (optional) In fact, if the user refers to the structure of yolov5, the output of 80 categories is adjusted to 80+1 category, and the newly added category 1 is used as the confidence level of the control box, which acts as a filter. In this way, the post-processing can reduce the number of logical judgments by 10 to 40 times when the CPU executes the threshold judgment. | ||
|
||
|
||
|
||
### 2. Export model operation | ||
|
||
After meeting the environmental requirements of ./requirements.txt, execute the following statement to export the model | ||
|
||
``` | ||
# Adjust the model file path in ./ultralytics/cfg/default.yaml, the default is yolov8n.pt, if you train the model yourself, please transfer to the corresponding path | ||
export PYTHONPATH=./ | ||
python ./ultralytics/engine/exporter.py | ||
After execution, the _rknnopt.torchscript model will be generated. If the original model is yolov8n.pt, generate the yolov8n_rknnopt.torchscript model. | ||
``` | ||
|
||
|
||
|
||
Export Code Changes Explained | ||
|
||
- In ./ultralytics/cfg/default.yaml, there is a parameter **format** for exporting the model format, and the support for 'rknn' has been added | ||
- When the model is inferred to Detect Head, format=='rknn' takes effect, dfl and post-processing are skipped, | ||
- It should be noted that this repository has not tested the optimization method of pose head and segment head, which is currently not supported. You can try to change it yourself if needed. | ||
|
||
|
||
|
||
### 3. Transfer to RKNN model, Python demo, C demo | ||
|
||
Please refer to https://github.com/airockchip/rknn_model_zoo/tree/main/models/CV/object_detection/yolo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
## 导出 RKNPU 适配模型说明 | ||
|
||
### 1.模型结构上的调整 | ||
|
||
- dfl 结构在 NPU 处理上性能不佳,移至模型外部。 | ||
|
||
假设有6000个候选框,原模型将 dfl 结构放置于 ''框置信度过滤" 前,则 6000 个候选框都需要计算经过 dfl 计算;而将 dfl 结构放置于 ''框置信度过滤" 后,假设过滤后剩 100 个候选框,则dfl部分计算量减少至 100 个,大幅减少了计算资源、带宽资源的占用。 | ||
|
||
|
||
|
||
- 假设有 6000 个候选框,检测类别是 80 类,则阈值检索操作需要重复 6000* 80 ~= 4.8*10^5 次,占据了较多耗时。故导出模型时,在模型中额外新增了对 80 类检测目标进行求和操作,用于快速过滤置信度。(该结构在部分情况下对有效,与模型的训练结果有关) | ||
|
||
可以在 **./ultralytics/nn/modules/head.py** 52行~54行的位置,注释掉这部分优化,对应的代码是: | ||
|
||
``` | ||
cls_sum = torch.clamp(y[-1].sum(1, keepdim=True), 0, 1) | ||
y.append(cls_sum) | ||
``` | ||
|
||
|
||
|
||
|
||
- (optional) 实际上,用户可以参考yolov5的结构,将80类输出调整为 80+1类,新增的1类作为控制框的置信度,起到快速过滤作用。这样后处理在cpu执行阈值判断的时候,就可以减少 10~40倍的逻辑判断次数。 | ||
|
||
|
||
|
||
### 2.导出模型操作 | ||
|
||
在满足 ./requirements.txt 的环境要求后,执行以下语句导出模型 | ||
|
||
``` | ||
# 调整 ./ultralytics/cfg/default.yaml 中 model 文件路径,默认为 yolov8n.pt,若自己训练模型,请调接至对应的路径 | ||
export PYTHONPATH=./ | ||
python ./ultralytics/engine/exporter.py | ||
执行完毕后,会生成 _rknnopt.torchscript 模型。假如原始模型为 yolov8n.pt,则生成 yolov8n_rknnopt.torchscript 模型。 | ||
``` | ||
|
||
|
||
|
||
导出代码改动解释 | ||
|
||
- ./ultralytics/cfg/default.yaml 导出模型格式的参数 format, 添加了 'rknn' 的支持 | ||
- 模型推理到 Detect Head 时,format=='rknn'生效,跳过dfl与后处理,输出推理结果 | ||
- 需要注意,本仓库没有测试对 pose head, segment head 的优化方式,目前暂不支持,如果需求可尝试自行更改。 | ||
|
||
|
||
|
||
### 3.转RKNN模型、Python demo、C demo | ||
|
||
请参考 https://github.com/airockchip/rknn_model_zoo/tree/main/models/CV/object_detection/yolo | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters