More metrics than ever
[0.6.0] - 2021-10-28
We are excited to announce that Torchmetrics v0.6 is now publicly available. TorchMetrics v0.6 does not focus on specific domains but adds a ton of new metrics to several domains, thus increasing the number of metrics in the repository to over 60! Not only have v0.6 added metrics within already covered domains, but we also add support for two new: Pairwise metrics and detection.
https://devblog.pytorchlightning.ai/torchmetrics-v0-6-more-metrics-than-ever-e98c3983621e
Pairwise Metrics
TorchMetrics v0.6 offers a new set of metrics in its functional backend for calculating pairwise distances. Given a tensor X
with shape [N,d]
(N
observations, each in d
dimensions), a pairwise metric calculates [N,N]
matrix of all possible combinations between the rows of X
.
Detection
TorchMetrics v0.6 now includes a detection package that provides for the MAP metric. The implementation essentially wraps pycocotools
around securing that we get the correct value, but with the benefit of now being able to scale to multiple devices (as any other metric in TorchMetrics).
New additions
-
In the
audio
package, we have two new metrics: Perceptual Evaluation of Speech Quality (PESQ) and Short Term Objective Intelligibility (STOI). Both metrics can be used to assert speech quality. -
In the
retrieval
package, we also have two new metrics: R-precision and Hit-rate. R-precision corresponds to recall at the R-th position of the query. The hit rate is the ratio of the total number of hits returned as a result of a query (hits) to the total number of hits returned. -
The
text
package also receives an update in the form of two new metrics: Sacre BLEU score and character error rate. Sacre BLUE score provides and more systematic way of comparing BLUE scores across tasks. The character error rate is similar to the word error rate but instead calculates if a given algorithm has correctly predicted a sentence based on a character-by-character comparison. -
The
regression
package got a single new metric in the form of the Tweedie deviance score metric. Deviance scores are generally a better measure of fit than measures such as squared error when trying to model data coming from highly screwed distributions. -
Finally, we have added five new metrics for simple aggregation:
SumMetric
,MeanMetric
,MinMetric
,MaxMetric
,CatMetric
. All five metrics take in a single input (either native python floats ortorch.Tensor
) and keep track of the sum, average, min, etc. These new aggregation metrics are especially useful in combination with self.log from lightning if you want to log something other than the average of the metric you are tracking.
Detail changes
Added
- Added audio metrics:
- Added Information retrieval metrics:
- Added NLP metrics:
- Added other metrics:
- Added
MAP
(mean average precision) metric to new detection package (#467) - Added support for float targets in
nDCG
metric (#437) - Added
average
argument toAveragePrecision
metric for reducing multi-label and multi-class problems (#477) - Added
MultioutputWrapper
(#510) - Added metric sweeping:
- Added simple aggregation metrics:
SumMetric
,MeanMetric
,CatMetric
,MinMetric
,MaxMetric
(#506) - Added pairwise submodule with metrics (#553)
pairwise_cosine_similarity
pairwise_euclidean_distance
pairwise_linear_similarity
pairwise_manhatten_distance
Changed
AveragePrecision
will now as default output themacro
average for multilabel and multiclass problems (#477)half
,double
,float
will no longer change the dtype of the metric states. Usemetric.set_dtype
instead (#493)- Renamed
AverageMeter
toMeanMetric
(#506) - Changed
is_differentiable
from property to a constant attribute (#551) ROC
andAUROC
will no longer throw an error when either the positive or negative class is missing. Instead, return 0 scores and give a warning
Deprecated
- Deprecated
torchmetrics.functional.self_supervised.embedding_similarity
in favour of new pairwise submodule
Removed
- Removed
dtype
property (#493)
Fixed
- Fixed bug in
F1
withaverage='macro'
andignore_index!=None
(#495) - Fixed bug in
pit
by using the returned first result to initialize device and type (#533) - Fixed
SSIM
metric using too much memory (#539) - Fixed bug where
device
property was not properly updated when the metric was a child of a module (#542)
Contributors
@an1lam, @Borda, @karthikrangasai, @lucadiliello, @mahinlma, @Obus, @quancs, @SkafteNicki, @stancld, @tkupek
If we forgot someone due to not matching commit email with GitHub account, let us know :]