Inconsistent default values for `average` argument in classification metrics #2320

StefanoWoerner · 2024-01-22T20:02:58Z

🐛 Bug

When instantiating the multiclass (or multilabel) accuracy metric through the Accuracy wrapper class (legacy), the default value for average is micro. When instantiating directly through MulticlassAccuracy (new way since 0.11 I believe), the default value is macro. This is inconsistent, which can lead to very unexpected results.

The same is true for all metrics that are subclasses of MulticlassStatScores, BinaryStatScores or MultilabelStatScores as well as their respective functional interfaces.

To Reproduce

Instantiate the metrics directly as well as through the wrapper.
Compare results.

Code sample

classes = {0: "A", 1: "B", 2: "C"}
num_classes = len(classes)
num_samples = 10
multiclass_preds = torch.randn(num_samples, num_classes)
multiclass_targets = torch.randint(0, num_classes, (num_samples,))

legacy_mc_acc = Accuracy("multiclass", num_classes)
new_mc_acc = MulticlassAccuracy(num_classes)

legacy_result = legacy_mc_acc(multiclass_preds, multiclass_targets)
new_result = new_mc_acc(multiclass_preds, multiclass_targets)

assert  new_result == legacy_result

Expected behavior

Consistency between the different interfaces.

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): >=0.11 (1.3 in my case)
Python & PyTorch Version (e.g., 1.0): irrelevant
Any other relevant information such as OS (e.g., Linux): irrelevant

Additional context

I would argue that in the case of accuracy the default being macro in the task-specific classes is not only inconsistent with legacy but actually wrong. The common deinition of accuracy is

$$ \mathrm{Acc}(\text{preds},\text{targets}) = \frac{1}{N}\sum_{i = 1}^{N} \left[ \begin{cases} 1 & \text{if preds}_i = \text{targets}_i \\ 0 & \text{otherwise} \end{cases} \right] $$

which is how accuracy is computed when setting average="micro".

Setting average="macro" can still be useful, as it is less prone to class imbalance. However, I think TorchMetrics should adhere to common definitions with the default settings, and would therefore argue for making micro the default.

The same is kind of true for precision and recall, which are also commonly defined as micro averages, if they are defined globally at all. Usually we encounter recall and precision as class-wise metrics.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-22T20:03:22Z

Hi! thanks for your contribution!, great first issue!

…Lightning-AI#2320, fixes, Lightning-AI#2047, resolves Lightning-AI#2280.

donghyeon · 2024-09-09T15:36:02Z

To prevent potential confusion and misinterpretation of accuracy results, I strongly ask the torchmetrics package authors to implement a clear warning message when users utilize its accuracy functions or classes. The current lack of such a warning, coupled with the pre-set default behavior, can lead users to unknowingly conflate "macro" and "micro" averaging, as they may not even be aware of the existence of these options. This oversight can result in incorrect decision-making based on a misunderstanding of the accuracy metric being used.

edmcman · 2024-11-13T15:44:50Z

This just bit me, and wasted a substantial amount of time!

StefanoWoerner added bug / fix Something isn't working help wanted Extra attention is needed labels Jan 22, 2024

Borda added the v1.3.x label Jan 22, 2024

Borda assigned SkafteNicki Jan 22, 2024

StefanoWoerner added a commit to StefanoWoerner/torchmetrics that referenced this issue Jan 23, 2024

Sets default values for average to micro throughout stat_scores. Fixes …

ede253c

…Lightning-AI#2320, fixes, Lightning-AI#2047, resolves Lightning-AI#2280.

Borda added question Further information is requested good first issue Good for newcomers labels Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent default values for `average` argument in classification metrics #2320

Inconsistent default values for `average` argument in classification metrics #2320

StefanoWoerner commented Jan 22, 2024 •

edited

Loading

github-actions bot commented Jan 22, 2024

donghyeon commented Sep 9, 2024

edmcman commented Nov 13, 2024

Inconsistent default values for average argument in classification metrics #2320

Inconsistent default values for average argument in classification metrics #2320

Comments

StefanoWoerner commented Jan 22, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

github-actions bot commented Jan 22, 2024

donghyeon commented Sep 9, 2024

edmcman commented Nov 13, 2024

Inconsistent default values for `average` argument in classification metrics #2320

Inconsistent default values for `average` argument in classification metrics #2320

StefanoWoerner commented Jan 22, 2024 •

edited

Loading