Refactor detection FMetric #4130

eugene123tw · 2024-11-24T17:05:10Z

Summary

Refactor FMetric by removing unused dynamic NMS threshold and optimizing the IOU/metric computations.

Evaluation Time Comparison

Eval Period	Before PR Eval Time	PR Eval Time
1	146.7377 sec	4.8786 sec
2	111.3974 sec	4.1782 sec
3	52.1444 sec	1.8198 sec
4	103.8570 sec	2.0841 sec
5	68.9539 sec	2.0969 sec
6	74.6123 sec	2.0660 sec

F1 Score Comparison

Before PR test/F1	PR test/F1
0.84606	0.873132

Overall Elapsed Time

Before PR Elapsed time: 0:38:00.087019
PR Elapsed time: 0:13:25.230139

Note:

Two separate experiments were conducted using the YOLOX-s model on the same dataset, comparing results before and after refactor of the evaluation function.
Although there is a difference in the F1 score (0.84606 vs. 0.873132) when tested, this discrepancy does not necessarily indicate an actual drop/increase in accuracy.
When evaluating the same model before and after the refactor, the metric was identical (0.846069216), verifying that the refactor did not affect the model's performance.

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have ran e2e tests and there is no issues.
I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
I have linked related issues.

License

I submit my code changes under the same Apache License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

sovrasov · 2024-11-25T14:07:54Z

@eugene123tw @kprokofi could you discuss the changes in details? From my side I'm afraid that this would affect the final estimated threshold value, which is important in case of small train/val. Perhaps, we should keep both slow (dynamic NMS threshold)/fast modes, but that depends on experiments. If the final confidence threshold estimation is now skewed, then it's ok to leave the fast version only.

eugene123tw · 2024-11-27T09:13:51Z

@eugene123tw @kprokofi could you discuss the changes in details? From my side I'm afraid that this would affect the final estimated threshold value, which is important in case of small train/val. Perhaps, we should keep both slow (dynamic NMS threshold)/fast modes, but that depends on experiments. If the final confidence threshold estimation is now skewed, then it's ok to leave the fast version only.

@sovrasov @kprokofi refactoring the F1-score calculation is relatively safe. The primary change involves replacing the IoU computation loop with matrix operations, so making the evaluation more efficient and faster. The final F1-score results, including confidence thresholds, remain consistent.

I validated the before/after comparison across 9 datasets and there's no significant differences in accuracy. The results are summarized in the table below:

Model	develop avg F1-score	PR avg F1-score
ATSS_MOBILENET	0.7120	0.7067
ATSS_RESNEXT101	0.7284	0.7243
RTDETR_18	0.7221	0.7245
RTDETR_50	0.7418	0.7429
RTDETR_101	0.7378	0.7136
RTMDET_TINY	0.7107	0.7132
SSD_MOBILENET	0.5885	0.5818
YOLOX_L	0.7261	0.7410
YOLOX_S	0.7166	0.6923
YOLOX_TINY	0.6455	0.6454
YOLOX_X	0.7384	0.7381
YOLOV9_C	0.6696	0.6609
YOLOV9_M	0.6682	0.6663
YOLOV9_S	0.6726	0.6820

Additionally, I verified functionality in tests/unit/core/metrics/test_fmeasure.py, and all checks passed without issues.

sovrasov · 2024-11-28T16:23:33Z

@eugene123tw could you have a look at iseg tests failure?

    results_per_confidence = self._get_results_per_confidence(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 233, in _get_results_per_confidence
    result_point = self.evaluate_classes(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 279, in evaluate_classes
    metrics, counters = self.get_f_measure_for_class(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 323, in get_f_measure_for_class
    batch_pred_bboxes = self.__filter_pred(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 381, in __filter_pred
    keep = (entity.labels == label_idx) & (entity.score > confidence_threshold)
RuntimeError: The size of tensor a (66) must match the size of tensor b (100) at non-singleton dimension 0
Traceback (most recent call last):

eugene123tw · 2024-11-28T17:53:19Z

@eugene123tw could you have a look at iseg tests failure?

    results_per_confidence = self._get_results_per_confidence(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 233, in _get_results_per_confidence
    result_point = self.evaluate_classes(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 279, in evaluate_classes
    metrics, counters = self.get_f_measure_for_class(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 323, in get_f_measure_for_class
    batch_pred_bboxes = self.__filter_pred(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 381, in __filter_pred
    keep = (entity.labels == label_idx) & (entity.score > confidence_threshold)
RuntimeError: The size of tensor a (66) must match the size of tensor b (100) at non-singleton dimension 0
Traceback (most recent call last):

@sovrasov I forgot to filter scores in TV MaskRCNN post-processing. New change in this file: src/otx/algo/instance_segmentation/segmentors/maskrcnn_tv.py

eugene123tw added 3 commits November 24, 2024 17:04

Refactor detection FMetric

b1cc52a

Update

bd76b62

Update

e84b51e

eugene123tw linked an issue Nov 24, 2024 that may be closed by this pull request

Issue: FMeasure compute too slow for detection task #4129

Closed

eugene123tw added 5 commits November 25, 2024 10:13

Refactor FMeasure

252085f

Update

1cf4a66

Update

f6ab220

update

0822b75

Merge branch 'develop' into eugene/fmeasure-refactor

11aa6c8

eugene123tw marked this pull request as ready for review November 25, 2024 13:28

eugene123tw requested review from samet-akcay, harimkang, kprokofi, chuneuny-emily, sovrasov, sungchul2, GalyaZalesskaya, negvet, goodsong81, yunchu, wonjuleee and eunwoosh as code owners November 25, 2024 13:28

eugene123tw changed the title ~~[Draft] Refactor detection FMetric~~ Refactor detection FMetric Nov 25, 2024

Update changelog

8765ac0

github-actions bot added TEST Any changes in tests DOC Improvements or additions to documentation labels Nov 25, 2024

sovrasov previously approved these changes Nov 28, 2024

View reviewed changes

Add scores to MaskRCNN result output filter

c98825a

eugene123tw dismissed sovrasov’s stale review via c98825a November 28, 2024 17:52

eugene123tw requested a review from sovrasov November 28, 2024 17:53

sovrasov approved these changes Nov 29, 2024

View reviewed changes

kprokofi approved these changes Dec 1, 2024

View reviewed changes

kprokofi merged commit 33f27f3 into openvinotoolkit:develop Dec 1, 2024
20 checks passed

eugene123tw deleted the eugene/fmeasure-refactor branch December 2, 2024 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor detection FMetric #4130

Refactor detection FMetric #4130

eugene123tw commented Nov 24, 2024 •

edited

Loading

sovrasov commented Nov 25, 2024 •

edited

Loading

eugene123tw commented Nov 27, 2024 •

edited

Loading

sovrasov commented Nov 28, 2024

eugene123tw commented Nov 28, 2024

Refactor detection FMetric #4130

Refactor detection FMetric #4130

Conversation

eugene123tw commented Nov 24, 2024 • edited Loading

Summary

Evaluation Time Comparison

F1 Score Comparison

Overall Elapsed Time

Note:

How to test

Checklist

License

sovrasov commented Nov 25, 2024 • edited Loading

eugene123tw commented Nov 27, 2024 • edited Loading

sovrasov commented Nov 28, 2024

eugene123tw commented Nov 28, 2024

eugene123tw commented Nov 24, 2024 •

edited

Loading

sovrasov commented Nov 25, 2024 •

edited

Loading

eugene123tw commented Nov 27, 2024 •

edited

Loading