Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor detection FMetric #4130

Merged

Conversation

eugene123tw
Copy link
Contributor

@eugene123tw eugene123tw commented Nov 24, 2024

Summary

Refactor FMetric by removing unused dynamic NMS threshold and optimizing the IOU/metric computations.

Evaluation Time Comparison

Eval Period Before PR Eval Time PR Eval Time
1 146.7377 sec 4.8786 sec
2 111.3974 sec 4.1782 sec
3 52.1444 sec 1.8198 sec
4 103.8570 sec 2.0841 sec
5 68.9539 sec 2.0969 sec
6 74.6123 sec 2.0660 sec

F1 Score Comparison

Before PR test/F1 PR test/F1
0.84606 0.873132

Overall Elapsed Time

Before PR Elapsed time: 0:38:00.087019
PR Elapsed time: 0:13:25.230139

Note:

  • Two separate experiments were conducted using the YOLOX-s model on the same dataset, comparing results before and after refactor of the evaluation function.
  • Although there is a difference in the F1 score (0.84606 vs. 0.873132) when tested, this discrepancy does not necessarily indicate an actual drop/increase in accuracy.
  • When evaluating the same model before and after the refactor, the metric was identical (0.846069216), verifying that the refactor did not affect the model's performance.

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have ran e2e tests and there is no issues.
  • I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).​
  • I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
  • I have linked related issues.

License

  • I submit my code changes under the same Apache License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

@eugene123tw eugene123tw linked an issue Nov 24, 2024 that may be closed by this pull request
@eugene123tw eugene123tw marked this pull request as ready for review November 25, 2024 13:28
@eugene123tw eugene123tw changed the title [Draft] Refactor detection FMetric Refactor detection FMetric Nov 25, 2024
@github-actions github-actions bot added TEST Any changes in tests DOC Improvements or additions to documentation labels Nov 25, 2024
@sovrasov
Copy link
Contributor

sovrasov commented Nov 25, 2024

@eugene123tw @kprokofi could you discuss the changes in details? From my side I'm afraid that this would affect the final estimated threshold value, which is important in case of small train/val. Perhaps, we should keep both slow (dynamic NMS threshold)/fast modes, but that depends on experiments. If the final confidence threshold estimation is now skewed, then it's ok to leave the fast version only.

@eugene123tw
Copy link
Contributor Author

eugene123tw commented Nov 27, 2024

@eugene123tw @kprokofi could you discuss the changes in details? From my side I'm afraid that this would affect the final estimated threshold value, which is important in case of small train/val. Perhaps, we should keep both slow (dynamic NMS threshold)/fast modes, but that depends on experiments. If the final confidence threshold estimation is now skewed, then it's ok to leave the fast version only.

@sovrasov @kprokofi refactoring the F1-score calculation is relatively safe. The primary change involves replacing the IoU computation loop with matrix operations, so making the evaluation more efficient and faster. The final F1-score results, including confidence thresholds, remain consistent.

I validated the before/after comparison across 9 datasets and there's no significant differences in accuracy. The results are summarized in the table below:

Model develop avg F1-score PR avg F1-score
ATSS_MOBILENET 0.7120 0.7067
ATSS_RESNEXT101 0.7284 0.7243
RTDETR_18 0.7221 0.7245
RTDETR_50 0.7418 0.7429
RTDETR_101 0.7378 0.7136
RTMDET_TINY 0.7107 0.7132
SSD_MOBILENET 0.5885 0.5818
YOLOX_L 0.7261 0.7410
YOLOX_S 0.7166 0.6923
YOLOX_TINY 0.6455 0.6454
YOLOX_X 0.7384 0.7381
YOLOV9_C 0.6696 0.6609
YOLOV9_M 0.6682 0.6663
YOLOV9_S 0.6726 0.6820

Additionally, I verified functionality in tests/unit/core/metrics/test_fmeasure.py, and all checks passed without issues.

sovrasov
sovrasov previously approved these changes Nov 28, 2024
@sovrasov
Copy link
Contributor

@eugene123tw could you have a look at iseg tests failure?

    results_per_confidence = self._get_results_per_confidence(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 233, in _get_results_per_confidence
    result_point = self.evaluate_classes(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 279, in evaluate_classes
    metrics, counters = self.get_f_measure_for_class(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 323, in get_f_measure_for_class
    batch_pred_bboxes = self.__filter_pred(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 381, in __filter_pred
    keep = (entity.labels == label_idx) & (entity.score > confidence_threshold)
RuntimeError: The size of tensor a (66) must match the size of tensor b (100) at non-singleton dimension 0
Traceback (most recent call last):

@eugene123tw
Copy link
Contributor Author

@eugene123tw could you have a look at iseg tests failure?

    results_per_confidence = self._get_results_per_confidence(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 233, in _get_results_per_confidence
    result_point = self.evaluate_classes(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 279, in evaluate_classes
    metrics, counters = self.get_f_measure_for_class(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 323, in get_f_measure_for_class
    batch_pred_bboxes = self.__filter_pred(
  File "/home/validation/actions-runner/_work/training_extensions/training_extensions/.tox/integration-test-instance_segmentation/lib/python3.10/site-packages/otx/core/metrics/fmeasure.py", line 381, in __filter_pred
    keep = (entity.labels == label_idx) & (entity.score > confidence_threshold)
RuntimeError: The size of tensor a (66) must match the size of tensor b (100) at non-singleton dimension 0
Traceback (most recent call last):

@sovrasov I forgot to filter scores in TV MaskRCNN post-processing. New change in this file: src/otx/algo/instance_segmentation/segmentors/maskrcnn_tv.py

@eugene123tw eugene123tw requested a review from sovrasov November 28, 2024 17:53
@kprokofi kprokofi merged commit 33f27f3 into openvinotoolkit:develop Dec 1, 2024
20 checks passed
@eugene123tw eugene123tw deleted the eugene/fmeasure-refactor branch December 2, 2024 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DOC Improvements or additions to documentation TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue: FMeasure compute too slow for detection task
3 participants