New NLP metrics and improved API #768

Borda · 2022-01-17T18:33:04Z

Borda
Jan 17, 2022
Maintainer

We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever.

NLP metrics - Text package

Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluation of the ROUGE score using multiple references.

Argument unification

Importantly, all text metrics assume preds, target input order with these explicit keyword arguments. If different naming was used before v0.7, it is deprecated and completely removed in v0.8.

Import and naming changes

TorchMetrics v0.7 brings more extensive and minor changes to how metrics should be imported. The import changes directly impact v0.7, meaning that you will most likely need to change the import statement for some specific metrics. All naming changes follow our standard deprecation process, meaning that in v0.7, any metric that is renamed will still work but raise an error asking to use the new metric name. From v0.8, the old metric names will no longer be available.

[0.7.0] - 2022-01-17

Added

Added NLP metrics:
- MatchErrorRate (MER - Match Error Rate #619)
- WordInfoLost and WordInfoPreserved (Word Information Lost and Preserved - ASR metrics #630)
- SQuAD (Add SQuAD Metric. #623)
- CHRFScore (Add ChrF++ #641)
- TranslationEditRate (Add TER #646)
- ExtendedEditDistance (add Extended Edit Distance (EED) metric #668)
Added MultiScaleSSIM into image metrics (Add MultiScaleStructuralSimilarityIndexMeasure #679)
Added Signal to Distortion Ratio (SDR) to audio package (adding SDR [audio] #565)
Added MinMaxMetric to wrappers (min max wrapper #556)
Added ignore_index to retrieval metrics (Add ignore_idx to retrieval metrics #676)
Added support for multi references in ROUGEScore (Multi Reference ROUGEScore #680)
Added a default VSCode devcontainer configuration (Devcontainer configuration #621)

Changed

Scalar metrics will now consistently have additional dimensions squeezed (Fix consistency in the output of scalar tensors #622)
Metrics having third party dependencies removed from global import (Change import pattern in TM #463)
Untokenized for BLEUScore input stay consistent with all the other text metrics (Untokenized Bleu score to stay consistent with all the other text metrics #640)
Arguments reordered for TER, BLEUScore, SacreBLEUScore, CHRFScore now the expected input order is predictions first and target second (Unify the input order for text (NLG) metrics - BLEU, SacreBLEU, TER, CHRF #696)
Changed dtype of metric state from torch.float to torch.long in ConfusionMatrix to accommodate larger values (bugfix: change dtype of confmat to int64 #715)
Unify preds, target input argument's naming across all text metrics (Unify preds, target input arguments for text metrics [1of2] bert, bleu, chrf, sacre_bleu, wip, wil #723, Unify preds, target input arguments for text metrics [2of2] cer, ter, wer, mer, rouge, squad #727)
- bert, bleu, chrf, sacre_bleu, wip, wil, cer, ter, wer, mer, rouge, squad

Deprecated

Renamed IoU -> Jaccard Index (rename IoU -> Jaccard Index #662)
Renamed text WER metric: (refactor: wer #714)
- functional.wer -> functional.word_error_rate
- WER -> WordErrorRate
Renamed correlation coefficient classes: (rename CorrCoef #710)
- MatthewsCorrcoef -> MatthewsCorrCoef
- PearsonCorrcoef -> PearsonCorrCoef
- SpearmanCorrcoef -> SpearmanCorrCoef
Renamed audio STOI metric: (rename STOI #753, Renamestoi functional #758)
- audio.STOI to audio.ShortTimeObjectiveIntelligibility
- functional.audio.stoi to functional.audio.short_time_objective_intelligibility
Renamed audio PESQ metrics: (rename PESQ #751)
- functional.audio.pesq -> functional.audio.perceptual_evaluation_speech_quality
- audio.PESQ -> audio.PerceptualEvaluationSpeechQuality
Renamed audio SDR metrics: (Refactor: SDR & SI_SDR #711)
- functional.sdr -> functional.signal_distortion_ratio
- functional.si_sdr -> functional.scale_invariant_signal_distortion_ratio
- SDR -> SignalDistortionRatio
- SI_SDR -> ScaleInvariantSignalDistortionRatio
Renamed audio SNR metrics: (Refactor: SNR & SI_SNR #712)
- functional.snr -> functional.signal_distortion_ratio
- functional.si_snr -> functional.scale_invariant_signal_noise_ratio
- SNR -> SignalNoiseRatio
- SI_SNR -> ScaleInvariantSignalNoiseRatio
Renamed F-score metrics: (unify f1 score #731, Unify r/fbeta/fbeta_score #740)
- functional.f1 -> functional.f1_score
- F1 -> F1Score
- functional.fbeta -> functional.fbeta_score
- FBeta -> FBetaScore
Renamed Hinge metric: ( rename hinge to hinge_loss #734)
- functional.hinge -> functional.hinge_loss
- Hinge -> HingeLoss
Renamed image PSNR metrics (rename peak_signal_noise_ratio #732)
- functional.psnr -> functional.peak_signal_noise_ratio
- PSNR -> PeakSignalNoiseRatio
Renamed image PIT metric: (rename permutation_invariant_training #737)
- functional.pit -> functional.permutation_invariant_training
- PIT -> PermutationInvariantTraining
Renamed image SSIM metric: (rename SSIM #747)
- functional.ssim -> functional.scale_invariant_signal_noise_ratio
- SSIM -> StructuralSimilarityIndexMeasure
Renamed detection MAP to MeanAveragePrecision metric (rename MeanAveragePrecision #754)
Renamed Fidelity & LPIPS image metric: (rename some Image metrics #752)
- image.FID -> image.FrechetInceptionDistance
- image.KID -> image.KernelInceptionDistance
- image.LPIPS -> image.LearnedPerceptualImagePatchSimilarity

Removed

Removed embedding_similarity metric (Remove deprecated code #638)
Removed argument concatenate_texts from wer metric (Remove deprecated code #638)
Removed arguments newline_sep and decimal_places from rouge metric (Remove deprecated code #638)

Fixed

Fixed MetricCollection kwargs filtering when no kwargs are present in update signature (Fix Collection kwargs filtering #707)

Contributors

@ashutoshml, @Borda, @cuent, @Fariborzzz, @getgaurav2, @janhenriklambrechts, @justusschock, @karthikrangasai, @lucadiliello, @mahinlma, @mathemusician, @mona0809, @mrleu, @puhuk, @quancs, @SkafteNicki, @stancld, @twsl

If we forgot someone due to not matching commit email with GitHub account, let us know :]

This discussion was created from the release New NLP metrics and improved API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New NLP metrics and improved API #768

{{title}}

Replies: 0 comments

Select a reply

New NLP metrics and improved API #768

Borda Jan 17, 2022 Maintainer

NLP metrics - Text package

Argument unification

Import and naming changes

[0.7.0] - 2022-01-17

Added

Changed

Deprecated

Removed

Fixed

Contributors

Replies: 0 comments

Borda
Jan 17, 2022
Maintainer