Text-related (NLP) metrics #440

Borda · 2021-08-10T14:22:44Z

Borda
Aug 10, 2021
Maintainer

[0.5.0] - 2021-08-09

This release includes general improvements to the library and new metrics within the NLP domain.

Natural language processing is arguably one of the most exciting areas of machine learning, with models such as BERT, ROBERTA, GPT-3 etc., really pushing what automated text translation, recognition, and generation systems are capable of.

With the introduction of these models, many metrics have been proposed that measure how well these models perform. TorchMetrics v0.5 includes 4 such metrics: BERT score, BLEU, ROUGE and WER.

Detail changes

Added

Added Text-related (NLP) metrics:
- Word Error Rate (WER) (Adding WER metric #383)
- ROUGE (add Rouge score #399)
- BERT score (Adding BERTScore metric #424)
- BLUE score (Added Blue Score the respective folders #360)
Added MetricTracker wrapper metric for keeping track of the same metric over multiple epochs (metric tracker #238)
Added other metrics:
- Symmetric Mean Absolute Percentage error (SMAPE) (SMAP regression error #375)
- Calibration error (Implementation of calibration error metrics #394)
- Permutation Invariant Training (PIT) (Add PIT for audio metrics #384)
Added support in nDCG metric for target with values larger than 1 (Allow target nDCG metric to be integer larger than 1 #349)
Added support for negative targets in nDCG metric (fix nDCG can not be called with negative relevance targets #378)
Added None as reduction option in CosineSimilarity metric (Add None as reduction option in CosineSimilarity #400)
Allowed passing labels in (n_samples, n_classes) to AveragePrecision (multilabel for AveragePrecision #386)

Changed

Moved psnr and ssim from functional.regression.* to functional.image.* (move functional psnr & ssim to image #382)
Moved image_gradient from functional.image_gradients to functional.image.gradients (Move image gradient #381)
Moved R2Score from regression.r2score to regression.r2 (cleaning & prune re-definine #371)
Pearson metric now only store 6 statistics instead of all predictions and targets (Fix OOM in pearson metric #380)
Use torch.argmax instead of torch.topk when k=1 for better performance (Use argmax when topk=1 #419)
Moved check for number of samples in R2 score to support single sample updating (Move checking in r2 for number of samples to compute #426)

Deprecated

Rename r2score >> r2_score and kldivergence >> kl_divergence in functional (cleaning & prune re-definine #371)
Moved bleu_score from functional.nlp to functional.text.bleu (Added Blue Score the respective folders #360)

Removed

Removed restriction that threshold has to be in (0,1) range to support logit input (Allow threshold to be outside (0,1) domain #351, Remove remaining threshold checks #401)
Removed restriction that preds could not be bigger than num_classes to support logit input (Remove check that preds value need to be smaller than num_classes #357)
Removed module regression.psnr and regression.ssim (move functional psnr & ssim to image #382):
Removed (remove deprecated from v0.4 & fix past #379):
- function functional.mean_relative_error
- num_thresholds argument in BinnedPrecisionRecallCurve

Fixed

Fixed bug where classification metrics with average='macro' would lead to wrong result if a class was missing (Fix metrics in macro average #303)
Fixed weighted, multi-class AUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (Weighted AUROC to omit empty classes #376)
Fixed that _forward_cache and _computed attributes are also moved to the correct device if metric is moved (Move forward cache and computed to device #413)
Fixed calculation in IoU metric when using ignore_index argument (fix ignore_index in the computation of IoU #328)

Contributors

@BeyondTheProof, @Borda, @CSautier, @discort, @edwardclem, @gagan3012, @hugoperrin, @karthikrangasai, @paul-grundmann, @quancs, @rajs96, @SkafteNicki, @vatch123

If we forgot someone due to not matching commit email with GitHub account, let us know :]

This discussion was created from the release Text-related (NLP) metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text-related (NLP) metrics #440

{{title}}

Replies: 0 comments

Select a reply

Text-related (NLP) metrics #440

Borda Aug 10, 2021 Maintainer

[0.5.0] - 2021-08-09

Detail changes

Added

Changed

Deprecated

Removed

Fixed

Contributors

Replies: 0 comments

Borda
Aug 10, 2021
Maintainer