Text-related (NLP) metrics
[0.5.0] - 2021-08-09
This release includes general improvements to the library and new metrics within the NLP domain.
https://devblog.pytorchlightning.ai/torchmetrics-v0-5-nlp-metrics-f4232467b0c5
Natural language processing is arguably one of the most exciting areas of machine learning, with models such as BERT, ROBERTA, GPT-3 etc., really pushing what automated text translation, recognition, and generation systems are capable of.
With the introduction of these models, many metrics have been proposed that measure how well these models perform. TorchMetrics v0.5 includes 4 such metrics: BERT score, BLEU, ROUGE and WER.
Detail changes
Added
- Added Text-related (NLP) metrics:
- Added
MetricTracker
wrapper metric for keeping track of the same metric over multiple epochs (#238) - Added other metrics:
- Added support in
nDCG
metric for target with values larger than 1 (#349) - Added support for negative targets in
nDCG
metric (#378) - Added
None
as reduction option inCosineSimilarity
metric (#400) - Allowed passing labels in (n_samples, n_classes) to
AveragePrecision
(#386)
Changed
- Moved
psnr
andssim
fromfunctional.regression.*
tofunctional.image.*
(#382) - Moved
image_gradient
fromfunctional.image_gradients
tofunctional.image.gradients
(#381) - Moved
R2Score
fromregression.r2score
toregression.r2
(#371) - Pearson metric now only store 6 statistics instead of all predictions and targets (#380)
- Use
torch.argmax
instead oftorch.topk
whenk=1
for better performance (#419) - Moved check for number of samples in R2 score to support single sample updating (#426)
Deprecated
- Rename
r2score
>>r2_score
andkldivergence
>>kl_divergence
infunctional
(#371) - Moved
bleu_score
fromfunctional.nlp
tofunctional.text.bleu
(#360)
Removed
- Removed restriction that
threshold
has to be in (0,1) range to support logit input (#351, #401) - Removed restriction that
preds
could not be bigger thannum_classes
to support logit input (#357) - Removed module
regression.psnr
andregression.ssim
(#382): - Removed (#379):
- function
functional.mean_relative_error
num_thresholds
argument inBinnedPrecisionRecallCurve
- function
Fixed
- Fixed bug where classification metrics with
average='macro'
would lead to wrong result if a class was missing (#303) - Fixed
weighted
,multi-class
AUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (#376) - Fixed that
_forward_cache
and_computed
attributes are also moved to the correct device if metric is moved (#413) - Fixed calculation in
IoU
metric when usingignore_index
argument (#328)
Contributors
@BeyondTheProof, @Borda, @CSautier, @discort, @edwardclem, @gagan3012, @hugoperrin, @karthikrangasai, @paul-grundmann, @quancs, @rajs96, @SkafteNicki, @vatch123
If we forgot someone due to not matching commit email with GitHub account, let us know :]