All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Note: we move fast, but still we preserve 0.1 version (one feature release) back compatibility.
- Added audio metric
NISQA
(#2792) - Added classification metric
LogAUC
(#2377) - Added classification metric
NegativePredictiveValue
(#2433) - Added regression metric
NormalizedRootMeanSquaredError
(#2442) - Added segmentation metric
Dice
(#2725) - Added method
merge_state
toMetric
(#2786) - Added support for propagation of the autograd graph in ddp setting (#2754)
- Changed naming and input order arguments in
KLDivergence
(#2800)
- Deprecated Dice from classification metrics (#2725)
- Changed minimum supported Pytorch version to 2.0 (#2671)
- Dropped support for Python 3.8 (#2827)
- Removed
num_outputs
inR2Score
(#2800)
- Fixed segmentation
Dice
+GeneralizedDice
for 2d index tensors (#2832) - Fixed mixed results of
rouge_score
withaccumulate='best'
(#2830)
- Re-adding
numpy
2+ support (#2804)
- Fixed iou scores in detection for either empty predictions/targets leading to wrong scores (#2805)
- Fixed
MetricCollection
compatibility withtorch.jit.script
(#2813) - Fixed assert in PIT (#2811)
- Patched
np.Inf
fornumpy
2.0+ (#2826)
- Changing
_modules
dict type in Pytorch 2.5 preventing to fail collections metrics (#2793)
- Added segmentation metric
HausdorffDistance
(#2122) - Added audio metric
DNSMOS
(#2525) - Added shape metric
ProcrustesDistance
(#2723 - Added
MetricInputTransformer
wrapper (#2392) - Added
input_format
argument to segmentation metrics (#2572) - Added
multi-output
support for MAE metric (#2605) - Added
truncation
argument toBERTScore
(#2776)
- Tracker higher is better integration (#2649)
- Updated
InfoLM
class to dynamically sethigher_is_better
(#2674)
- Deprecated
num_outputs
inR2Score
(#2705)
- Fixed corner case in
IoU
metric for single empty prediction tensors (#2780) - Fixed
PSNR
calculation for integer type input images (#2788)
- Fixed for Pearson changes inputs (#2765)
- Fixed bug in
PESQ
metric whereNoUtterancesError
prevented calculating on a batch of data (#2753) - Fixed corner case in
MatthewsCorrCoef
(#2743)
- Re-adding
Chrf
implementation (#2701)
- Fixed wrong aggregation in
segmentation.MeanIoU
(#2698) - Fixed handling zero division error in binary IoU (Jaccard index) calculation (#2726)
- Corrected the padding related calculation errors in SSIM (#2721)
- Fixed compatibility of audio domain with new
scipy
(#2733) - Fixed how
prefix
/postfix
works inMultitaskWrapper
(#2722) - Fixed flakiness in tests related to
torch.unique
withdim=None
(#2650)
- Calculate text color of
ConfusionMatrix
plot based on luminance (#2590) - Updated
_safe_divide
to allowAccuracy
to run on the GPU (#2640) - Improved error messages for intersection detection metrics for wrong user input (#2577)
- Dropped
Chrf
implementation due to licensing issues with the upstream package (#2668)
- Fixed bug in
MetricCollection
when using compute groups andcompute
is called more than once (#2571) - Fixed class order of
panoptic_quality(..., return_per_class=True)
output (#2548) - Fixed
BootstrapWrapper
not being reset correctly (#2574) - Fixed integration between
ClasswiseWrapper
andMetricCollection
with custom_filter_kwargs
method (#2575) - Fixed BertScore calculation: pred target misalignment (#2347)
- Fixed
_cumsum
helper function in multi-gpu (#2636) - Fixed bug in
MeanAveragePrecision.coco_to_tm
(#2588) - Fixed missed f-strings in exceptions/warnings (#2667)
- Added
SensitivityAtSpecificity
metric to classification subpackage (#2217) - Added
QualityWithNoReference
metric to image subpackage (#2288) - Added a new segmentation metric:
- Added support for calculating segmentation quality and recognition quality in
PanopticQuality
metric (#2381) - Added
pretty-errors
for improving error prints (#2431) - Added support for
torch.float
weighted networks for FID and KID calculations (#2483) - Added
zero_division
argument to selected classification metrics (#2198)
- Made
__getattr__
and__setattr__
ofClasswiseWrapper
more general (#2424)
- Fix getitem for metric collection when prefix/postfix is set (#2430)
- Fixed axis names with Precision-Recall curve (#2462)
- Fixed list synchronization with partly empty lists (#2468)
- Fixed memory leak in metrics using list states (#2492)
- Fixed bug in computation of
ERGAS
metric (#2498) - Fixed
BootStrapper
wrapper not working withkwargs
provided argument (#2503) - Fixed warnings being suppressed in
MeanAveragePrecision
when requested (#2501) - Fixed corner-case in
binary_average_precision
when only negative samples are provided (#2507)
- Fixed negative variance estimates in certain image metrics (#2378)
- Fixed dtype being changed by deepspeed for certain regression metrics (#2379)
- Fixed plotting of metric collection when prefix/postfix is set (#2429)
- Fixed bug when
top_k>1
andaverage="macro"
for classification metrics (#2423) - Fixed case where label prediction tensors in classification metrics were not validated correctly (#2427)
- Fixed how auc scores are calculated in
PrecisionRecallCurve.plot
methods (#2437)
- Fixed how backprop is handled in
LPIPS
metric (#2326) - Fixed
MultitaskWrapper
not being able to be logged in lightning when using metric collections (#2349) - Fixed high memory consumption in
Perplexity
metric (#2346) - Fixed cached network in
FeatureShare
not being moved to the correct device (#2348) - Fix naming of statistics in
MeanAveragePrecision
with custom max det thresholds (#2367) - Fixed custom aggregation in retrieval metrics (#2364)
- Fixed initialize aggregation metrics with default floating type (#2366)
- Fixed plotting of confusion matrices (#2358)
- Added more tokenizers for
SacreBLEU
metric (#2068) - Added support for logging
MultiTaskWrapper
directly with lightningslog_dict
method (#2213) - Added
FeatureShare
wrapper to share submodules containing feature extractors between metrics (#2120) - Added new metrics to image domain:
- Added
average
argument to multiclass versions ofPrecisionRecallCurve
andROC
(#2084) - Added confidence scores when
extended_summary=True
inMeanAveragePrecision
(#2212) - Added
RetrievalAUROC
metric (#2251) - Added
aggregate
argument to retrieval metrics (#2220) - Added utility functions in
segmentation.utils
for future segmentation metrics (#2105)
- Changed minimum supported Pytorch version from 1.8 to 1.10 (#2145)
- Changed x-/y-axis order for
PrecisionRecallCurve
to be consistent with scikit-learn (#2183)
- Deprecated
metric._update_called
(#2141) - Deprecated
specicity_at_sensitivity
in favour ofspecificity_at_sensitivity
(#2199)
- Fixed support for half precision + CPU in metrics requiring topk operator (#2252)
- Fixed warning incorrectly being raised in
Running
metrics (#2256) - Fixed integration with custom feature extractor in
FID
metric (#2277)
- Added error if
NoTrainInceptionV3
is being initialized withouttorch-fidelity
not being installed (#2143) - Added support for Pytorch v2.1 (#2142)
- Change default state of
SpectralAngleMapper
andUniversalImageQualityIndex
to be tensors (#2089) - Use
torch
range func and repeat for deterministic bincount (#2184)
- Removed unused
lpips
third-party package as dependency ofLearnedPerceptualImagePatchSimilarity
metric (#2230)
- Fixed numerical stability bug in
LearnedPerceptualImagePatchSimilarity
metric (#2144) - Fixed numerical stability issue in
UniversalImageQualityIndex
metric (#2222) - Fixed incompatibility for
MeanAveragePrecision
withpycocotools
backend when too littlemax_detection_thresholds
are provided (#2219) - Fixed support for half precision in Perplexity metric (#2235)
- Fixed device and dtype for
LearnedPerceptualImagePatchSimilarity
functional metric (#2234) - Fixed bug in
Metric._reduce_states(...)
when usingdist_sync_fn="cat"
(#2226) - Fixed bug in
CosineSimilarity
where 2d is expected but 1d input was given (#2241) - Fixed bug in
MetricCollection
when using compute groups andcompute
is called more than once (#2211)
- Added metric to cluster package:
MutualInformationScore
(#2008)RandScore
(#2025)NormalizedMutualInfoScore
(#2029)AdjustedRandScore
(#2032)CalinskiHarabaszScore
(#2036)DunnIndex
(#2049)HomogeneityScore
(#2053)CompletenessScore
(#2053)VMeasureScore
(#2053)FowlkesMallowsIndex
(#2066)AdjustedMutualInfoScore
(#2058)DaviesBouldinScore
(#2071)
- Added
backend
argument toMeanAveragePrecision
(#2034)
- Fixed tie breaking in ndcg metric (#2031)
- Fixed bug in
BootStrapper
when very few samples were evaluated that could lead to crash (#2052) - Fixed bug when creating multiple plots that lead to not all plots being shown (#2060)
- Fixed performance issues in
RecallAtFixedPrecision
for large batch sizes (#2042) - Fixed bug related to
MetricCollection
used with custom metrics haveprefix
/postfix
attributes (#2070)
- Added
average
argument toMeanAveragePrecision
(#2018)
- Fixed bug in
PearsonCorrCoef
is updated on single samples at a time (#2019) - Fixed support for pixel-wise MSE (#2017)
- Fixed bug in
MetricCollection
when used with multiple metrics that return dicts with same keys (#2027) - Fixed bug in detection intersection metrics when
class_metrics=True
resulting in wrong values (#1924) - Fixed missing attributes
higher_is_better
,is_differentiable
for some metrics (#2028)
- Added source aggregated signal-to-distortion ratio (SA-SDR) metric (#1882
- Added
VisualInformationFidelity
to image package (#1830) - Added
EditDistance
to text package (#1906) - Added
top_k
argument toRetrievalMRR
in retrieval package (#1961) - Added support for evaluating
"segm"
and"bbox"
detection inMeanAveragePrecision
at the same time (#1928) - Added
PerceptualPathLength
to image package (#1939) - Added support for multioutput evaluation in
MeanSquaredError
(#1937) - Added argument
extended_summary
toMeanAveragePrecision
such that precision, recall, iou can be easily returned (#1983) - Added warning to
ClipScore
if long captions are detected and truncate (#2001) - Added
CLIPImageQualityAssessment
to multimodal package (#1931) - Added new property
metric_state
to all metrics for users to investigate currently stored tensors in memory (#2006)
- Added warning to
MeanAveragePrecision
if too many detections are observed (#1978)
- Fix support for int input for when
multidim_average="samplewise"
in classification metrics (#1977) - Fixed x/y labels when plotting confusion matrices (#1976)
- Fixed IOU compute in cuda (#1982)
- Added warning to
PearsonCorrCoeff
if input has a very small variance for its given dtype (#1926)
- Changed all non-task specific classification metrics to be true subtypes of
Metric
(#1963)
- Fixed bug in
CalibrationError
where calculations for double precision input was performed in float precision (#1919) - Fixed bug related to the
prefix/postfix
arguments inMetricCollection
andClasswiseWrapper
being duplicated (#1918) - Fixed missing AUC score when plotting classification metrics that support the
score
argument (#1948)
- Fixes corner case when using
MetricCollection
together with aggregation metrics (#1896) - Fixed the use of
max_fpr
inAUROC
metric when only one class is present (#1895) - Fixed bug related to empty predictions for
IntersectionOverUnion
metric (#1892) - Fixed bug related to
MeanMetric
and broadcasting of weights when Nans are present (#1898) - Fixed bug related to expected input format of pycoco in
MeanAveragePrecision
(#1913)
- Added
prefix
andpostfix
arguments toClasswiseWrapper
(#1866) - Added speech-to-reverberation modulation energy ratio (SRMR) metric (#1792, #1872)
- Added new global arg
compute_with_cache
to control caching behaviour aftercompute
method (#1754) - Added
ComplexScaleInvariantSignalNoiseRatio
for audio package (#1785) - Added
Running
wrapper for calculate running statistics (#1752) - Added
RelativeAverageSpectralError
andRootMeanSquaredErrorUsingSlidingWindow
to image package (#816) - Added support for
SpecificityAtSensitivity
Metric (#1432) - Added support for plotting of metrics through
.plot()
method ( #1328, #1481, #1480, #1490, #1581, #1585, #1593, #1600, #1605, #1610, #1609, #1621, #1624, #1623, #1638, #1631, #1650, #1639, #1660, #1682, #1786, ) - Added support for plotting of audio metrics through
.plot()
method (#1434) - Added
classes
to output fromMAP
metric (#1419) - Added Binary group fairness metrics to classification package (#1404)
- Added
MinkowskiDistance
to regression package (#1362) - Added
pairwise_minkowski_distance
to pairwise package (#1362) - Added new detection metric
PanopticQuality
( #929, #1527, ) - Added
PSNRB
metric (#1421) - Added
ClassificationTask
Enum and use in metrics (#1479) - Added
ignore_index
option toexact_match
metric (#1540) - Add parameter
top_k
toRetrievalMAP
(#1501) - Added support for deterministic evaluation on GPU for metrics that uses
torch.cumsum
operator (#1499) - Added support for plotting of aggregation metrics through
.plot()
method (#1485) - Added support for python 3.11 (#1612)
- Added support for auto clamping of input for metrics that uses the
data_range
([#1606](argument #1606)) - Added
ModifiedPanopticQuality
metric to detection package (#1627) - Added
PrecisionAtFixedRecall
metric to classification package (#1683) - Added multiple metrics to detection package (#1284)
IntersectionOverUnion
GeneralizedIntersectionOverUnion
CompleteIntersectionOverUnion
DistanceIntersectionOverUnion
- Added
MultitaskWrapper
to wrapper package (#1762) - Added
RelativeSquaredError
metric to regression package (#1765) - Added
MemorizationInformedFrechetInceptionDistance
metric to image package (#1580)
- Changed
permutation_invariant_training
to allow using a'permutation-wise'
metric function (#1794) - Changed
update_count
andupdate_called
from private to public methods (#1370) - Raise exception for invalid kwargs in Metric base class (#1427)
- Extend
EnumStr
raisingValueError
for invalid value (#1479) - Improve speed and memory consumption of binned
PrecisionRecallCurve
with large number of samples (#1493) - Changed
__iter__
method from raisingNotImplementedError
toTypeError
by setting toNone
(#1538) FID
metric will now raise an error if too few samples are provided (#1655)- Allowed FID with
torch.float64
(#1628) - Changed
LPIPS
implementation to no more rely on third-party package (#1575) - Changed FID matrix square root calculation from
scipy
totorch
(#1708) - Changed calculation in
PearsonCorrCoeff
to be more robust in certain cases (#1729) - Changed
MeanAveragePrecision
topycocotools
backend (#1832)
- Support for python 3.7 (#1640)
- Fixed support in
MetricTracker
forMultioutputWrapper
and nested structures (#1608) - Fixed restrictive check in
PearsonCorrCoef
(#1649) - Fixed integration with
jsonargparse
andLightningCLI
(#1651) - Fixed corner case in calibration error for zero confidence input (#1648)
- Fix precision-recall curve based computations for float target (#1642)
- Fixed missing kwarg squeeze in
MultiOutputWrapper
(#1675) - Fixed padding removal for 3d input in
MSSSIM
(#1674) - Fixed
max_det_threshold
in MAP detection (#1712) - Fixed states being saved in metrics that use
register_buffer
(#1728) - Fixed states not being correctly synced and device transferred in
MeanAveragePrecision
foriou_type="segm"
(#1763) - Fixed use of
prefix
andpostfix
in nestedMetricCollection
(#1773) - Fixed
ax
plotting logging in `MetricCollection (#1783) - Fixed lookup for punkt sources being downloaded in
RougeScore
(#1789) - Fixed integration with lightning for
CompositionalMetric
(#1761) - Fixed several bugs in
SpectralDistortionIndex
metric (#1808) - Fixed bug for corner cases in
MatthewsCorrCoef
( #1812, #1863 ) - Fixed support for half precision in
PearsonCorrCoef
(#1819) - Fixed number of bugs related to
average="macro"
in classification metrics (#1821) - Fixed off-by-one issue when
ignore_index = num_classes + 1
in Multiclass-jaccard (#1860)
- Fixed evaluation of
R2Score
with near constant target (#1576) - Fixed dtype conversion when metric is submodule (#1583)
- Fixed bug related to
top_k>1
andignore_index!=None
inStatScores
based metrics (#1589) - Fixed corner case for
PearsonCorrCoef
when running in ddp mode but only on single device (#1587) - Fixed overflow error for specific cases in
MAP
when big areas are calculated (#1607)
- Fixed classification metrics for
byte
input (#1521) - Fixed the use of
ignore_index
inMulticlassJaccardIndex
(#1386)
- Fixed compatibility between XLA in
_bincount
function (#1471) - Fixed type hints in methods belonging to
MetricTracker
wrapper (#1472) - Fixed
multilabel
inExactMatch
(#1474)
- Fixed type checking on the
maximize
parameter at the initialization ofMetricTracker
(#1428) - Fixed mixed precision autocast for
SSIM
metric (#1454) - Fixed checking for
nltk.punkt
inRougeScore
if a machine is not online (#1456) - Fixed wrongly reset method in
MultioutputWrapper
(#1460) - Fixed dtype checking in
PrecisionRecallCurve
fortarget
tensor (#1457)
- Added
MulticlassExactMatch
to classification metrics (#1343) - Added
TotalVariation
to image package (#978) - Added
CLIPScore
to new multimodal package (#1314) - Added regression metrics:
- Added new nominal metrics:
- Added option to pass
distributed_available_fn
to metrics to allow checks for custom communication backend for makingdist_sync_fn
actually useful (#1301) - Added
normalize
argument toInception
,FID
,KID
metrics (#1246)
- Changed minimum Pytorch version to be 1.8 (#1263)
- Changed interface for all functional and modular classification metrics after refactor (#1252)
- Removed deprecated
BinnedAveragePrecision
,BinnedPrecisionRecallCurve
,RecallAtFixedPrecision
(#1251) - Removed deprecated
LabelRankingAveragePrecision
,LabelRankingLoss
andCoverageError
(#1251) - Removed deprecated
KLDivergence
andAUC
(#1251)
- Fixed precision bug in
pairwise_euclidean_distance
(#1352)
- Fixed bug in
Metrictracker.best_metric
whenreturn_step=False
(#1306) - Fixed bug to prevent users from going into an infinite loop if trying to iterate of a single metric (#1320)
- Changed in-place operation to out-of-place operation in
pairwise_cosine_similarity
(#1288)
- Fixed high memory usage for certain classification metrics when
average='micro'
(#1286) - Fixed precision problems when
structural_similarity_index_measure
was used with autocast (#1291) - Fixed slow performance for confusion matrix based metrics (#1302)
- Fixed restrictive dtype checking in
spearman_corrcoef
when used with autocast (#1303)
- Fixed broken clone method for classification metrics (#1250)
- Fixed unintentional downloading of
nltk.punkt
whenlsum
not inrouge_keys
(#1258) - Fixed type casting in
MAP
metric betweenbool
andfloat32
(#1150)
- Added a new NLP metric
InfoLM
(#915) - Added
Perplexity
metric (#922) - Added
ConcordanceCorrCoef
metric to regression package (#1201) - Added argument
normalize
toLPIPS
metric (#1216) - Added support for multiprocessing of batches in
PESQ
metric (#1227) - Added support for multioutput in
PearsonCorrCoef
andSpearmanCorrCoef
(#1200)
- Classification refactor ( #1054, #1143, #1145, #1151, #1159, #1163, #1167, #1175, #1189, #1197, #1215, #1195 )
- Changed update in
FID
metric to be done in online fashion to save memory (#1199) - Improved performance of retrieval metrics (#1242)
- Changed
SSIM
andMSSSIM
update to be online to reduce memory usage (#1231)
- Deprecated
BinnedAveragePrecision
,BinnedPrecisionRecallCurve
,BinnedRecallAtFixedPrecision
(#1163)BinnedAveragePrecision
-> useAveragePrecision
withthresholds
argBinnedPrecisionRecallCurve
-> useAveragePrecisionRecallCurve
withthresholds
argBinnedRecallAtFixedPrecision
-> useRecallAtFixedPrecision
withthresholds
arg
- Renamed and refactored
LabelRankingAveragePrecision
,LabelRankingLoss
andCoverageError
(#1167)LabelRankingAveragePrecision
->MultilabelRankingAveragePrecision
LabelRankingLoss
->MultilabelRankingLoss
CoverageError
->MultilabelCoverageError
- Deprecated
KLDivergence
andAUC
from classification package (#1189)KLDivergence
moved toregression
package- Instead of
AUC
usetorchmetrics.utils.compute.auc
- Fixed a bug in
ssim
whenreturn_full_image=True
where the score was still reduced (#1204) - Fixed MPS support for:
- Fixed bug in
ClasswiseWrapper
such thatcompute
gave wrong result (#1225) - Fixed synchronization of empty list states (#1219)
- Added global option
sync_on_compute
to disable automatic synchronization whencompute
is called (#1107)
- Fixed missing reset in
ClasswiseWrapper
(#1129) - Fixed
JaccardIndex
multi-label compute (#1125) - Fix SSIM propagate device if
gaussian_kernel
is False, add test (#1149)
- Fixed mAP calculation for areas with 0 predictions (#1080)
- Fixed bug where avg precision state and auroc state was not merge when using MetricCollections (#1086)
- Skip box conversion if no boxes are present in
MeanAveragePrecision
(#1097) - Fixed inconsistency in docs and code when setting
average="none"
inAveragePrecision
metric (#1116)
- Added specific
RuntimeError
when metric object is on the wrong device (#1056) - Added an option to specify own n-gram weights for
BLEUScore
andSacreBLEUScore
instead of using uniform weights only. (#1075)
- Fixed aggregation metrics when input only contains zero (#1070)
- Fixed
TypeError
when providing superclass arguments askwargs
(#1069) - Fixed bug related to state reference in metric collection when using compute groups (#1076)
- Added
RetrievalPrecisionRecallCurve
andRetrievalRecallAtFixedPrecision
to retrieval package (#951) - Added class property
full_state_update
that determinesforward
should callupdate
once or twice ( #984, #1033) - Added support for nested metric collections (#1003)
- Added
Dice
to classification package (#1021) - Added support to segmentation type
segm
as IOU for mean average precision (#822)
- Renamed
reduction
argument toaverage
in Jaccard score and added additional options (#874)
- Removed deprecated
compute_on_step
argument ( #962, #967, #979, #990, #991, #993, #1005, #1004, #1007 )
- Fixed non-empty state dict for a few metrics (#1012)
- Fixed bug when comparing states while finding compute groups (#1022)
- Fixed
torch.double
support in stat score metrics (#1023) - Fixed
FID
calculation for non-equal size real and fake input (#1028) - Fixed case where
KLDivergence
could outputNan
(#1030) - Fixed deterministic for PyTorch<1.8 (#1035)
- Fixed default value for
mdmc_average
inAccuracy
(#1036) - Fixed missing copy of property when using compute groups in
MetricCollection
(#1052)
- Fixed multi device aggregation in
PearsonCorrCoef
(#998) - Fixed MAP metric when using custom list of thresholds (#995)
- Fixed compatibility between compute groups in
MetricCollection
and prefix/postfix arg (#1007) - Fixed compatibility with future Pytorch 1.12 in
safe_matmul
(#1011, #1014)
- Reimplemented the
signal_distortion_ratio
metric, which removed the absolute requirement offast-bss-eval
(#964)
- Fixed "Sort currently does not support bool dtype on CUDA" error in MAP for empty preds (#983)
- Fixed
BinnedPrecisionRecallCurve
whenthresholds
argument is not provided (#968) - Fixed
CalibrationError
to work on logit input (#985)
- Added
WeightedMeanAbsolutePercentageError
to regression package (#948) - Added new classification metrics:
- Added new image metric:
- Added support for
MetricCollection
inMetricTracker
(#718) - Added support for 3D image and uniform kernel in
StructuralSimilarityIndexMeasure
(#818) - Added smart update of
MetricCollection
(#709) - Added
ClasswiseWrapper
for better logging of classification metrics with multiple output values (#832) - Added
**kwargs
argument for passing additional arguments to base class (#833) - Added negative
ignore_index
for the Accuracy metric (#362) - Added
adaptive_k
for theRetrievalPrecision
metric (#910) - Added
reset_real_features
argument image quality assessment metrics (#722) - Added new keyword argument
compute_on_cpu
to all metrics (#867)
- Made
num_classes
injaccard_index
a required argument (#853, #914) - Added normalizer, tokenizer to ROUGE metric (#838)
- Improved shape checking of
permutation_invariant_training
(#864) - Allowed reduction
None
(#891) MetricTracker.best_metric
will now give a warning when computing on metric that do not have a best (#913)
- Deprecated argument
compute_on_step
(#792) - Deprecated passing in
dist_sync_on_step
,process_group
,dist_sync_fn
direct argument (#833)
- Removed support for versions of Pytorch-Lightning lower than v1.5 (#788)
- Removed deprecated functions, and warnings in Text (#773)
WER
andfunctional.wer
- Removed deprecated functions and warnings in Image (#796)
SSIM
andfunctional.ssim
PSNR
andfunctional.psnr
- Removed deprecated functions, and warnings in classification and regression (#806)
FBeta
andfunctional.fbeta
F1
andfunctional.f1
Hinge
andfunctional.hinge
IoU
andfunctional.iou
MatthewsCorrcoef
PearsonCorrcoef
SpearmanCorrcoef
- Removed deprecated functions, and warnings in detection and pairwise (#804)
MAP
andfunctional.pairwise.manhatten
- Removed deprecated functions, and warnings in Audio (#805)
PESQ
andfunctional.audio.pesq
PIT
andfunctional.audio.pit
SDR
andfunctional.audio.sdr
andfunctional.audio.si_sdr
SNR
andfunctional.audio.snr
andfunctional.audio.si_snr
STOI
andfunctional.audio.stoi
- Removed unused
get_num_classes
fromtorchmetrics.utilities.data
(#914)
- Fixed device mismatch for
MAP
metric in specific cases (#950) - Improved testing speed (#820)
- Fixed compatibility of
ClasswiseWrapper
with theprefix
argument ofMetricCollection
(#843) - Fixed
BestScore
on GPU (#912) - Fixed Lsum computation for
ROUGEScore
(#944)
- Fixed unsafe log operation in
TweedieDeviace
for power=1 (#847) - Fixed bug in MAP metric related to either no ground truth or no predictions (#884)
- Fixed
ConfusionMatrix
,AUROC
andAveragePrecision
on GPU when running in deterministic mode (#900) - Fixed NaN or Inf results returned by
signal_distortion_ratio
(#899) - Fixed memory leak when using
update
method with tensor whererequires_grad=True
(#902)
- Minor patches in JOSS paper.
- Used
torch.bucketize
in calibration error whentorch>1.8
for faster computations (#769) - Improve mAP performance (#742)
- Fixed check for available modules (#772)
- Fixed Matthews correlation coefficient when the denominator is 0 (#781)
- Added NLP metrics:
- Added
MultiScaleSSIM
into image metrics (#679) - Added Signal to Distortion Ratio (
SDR
) to audio package (#565) - Added
MinMaxMetric
to wrappers (#556) - Added
ignore_index
to retrieval metrics (#676) - Added support for multi references in
ROUGEScore
(#680) - Added a default VSCode devcontainer configuration (#621)
- Scalar metrics will now consistently have additional dimensions squeezed (#622)
- Metrics having third party dependencies removed from global import (#463)
- Untokenized for
BLEUScore
input stay consistent with all the other text metrics (#640) - Arguments reordered for
TER
,BLEUScore
,SacreBLEUScore
,CHRFScore
now expect input order as predictions first and target second (#696) - Changed dtype of metric state from
torch.float
totorch.long
inConfusionMatrix
to accommodate larger values (#715) - Unify
preds
,target
input argument's naming across all text metrics (#723, #727)bert
,bleu
,chrf
,sacre_bleu
,wip
,wil
,cer
,ter
,wer
,mer
,rouge
,squad
- Renamed IoU -> Jaccard Index (#662)
- Renamed text WER metric (#714)
functional.wer
->functional.word_error_rate
WER
->WordErrorRate
- Renamed correlation coefficient classes: (#710)
MatthewsCorrcoef
->MatthewsCorrCoef
PearsonCorrcoef
->PearsonCorrCoef
SpearmanCorrcoef
->SpearmanCorrCoef
- Renamed audio STOI metric: (#753, #758)
audio.STOI
toaudio.ShortTimeObjectiveIntelligibility
functional.audio.stoi
tofunctional.audio.short_time_objective_intelligibility
- Renamed audio PESQ metrics: (#751)
functional.audio.pesq
->functional.audio.perceptual_evaluation_speech_quality
audio.PESQ
->audio.PerceptualEvaluationSpeechQuality
- Renamed audio SDR metrics: (#711)
functional.sdr
->functional.signal_distortion_ratio
functional.si_sdr
->functional.scale_invariant_signal_distortion_ratio
SDR
->SignalDistortionRatio
SI_SDR
->ScaleInvariantSignalDistortionRatio
- Renamed audio SNR metrics: (#712)
functional.snr
->functional.signal_distortion_ratio
functional.si_snr
->functional.scale_invariant_signal_noise_ratio
SNR
->SignalNoiseRatio
SI_SNR
->ScaleInvariantSignalNoiseRatio
- Renamed F-score metrics: (#731, #740)
functional.f1
->functional.f1_score
F1
->F1Score
functional.fbeta
->functional.fbeta_score
FBeta
->FBetaScore
- Renamed Hinge metric: (#734)
functional.hinge
->functional.hinge_loss
Hinge
->HingeLoss
- Renamed image PSNR metrics (#732)
functional.psnr
->functional.peak_signal_noise_ratio
PSNR
->PeakSignalNoiseRatio
- Renamed image PIT metric: (#737)
functional.pit
->functional.permutation_invariant_training
PIT
->PermutationInvariantTraining
- Renamed image SSIM metric: (#747)
functional.ssim
->functional.scale_invariant_signal_noise_ratio
SSIM
->StructuralSimilarityIndexMeasure
- Renamed detection
MAP
toMeanAveragePrecision
metric (#754) - Renamed Fidelity & LPIPS image metric: (#752)
image.FID
->image.FrechetInceptionDistance
image.KID
->image.KernelInceptionDistance
image.LPIPS
->image.LearnedPerceptualImagePatchSimilarity
- Removed
embedding_similarity
metric (#638) - Removed argument
concatenate_texts
fromwer
metric (#638) - Removed arguments
newline_sep
anddecimal_places
fromrouge
metric (#638)
- Fixed MetricCollection kwargs filtering when no
kwargs
are present in update signature (#707)
- Fixed
torch.sort
currently does not support booldtype
on CUDA (#665) - Fixed mAP properly checks if ground truths are empty (#684)
- Fixed initialization of tensors to be on correct device for
MAP
metric (#673)
- Migrate MAP metrics from pycocotools to PyTorch (#632)
- Use
torch.topk
instead oftorch.argsort
in retrieval precision for speedup (#627)
- Fix empty predictions in MAP metric (#594, #610, #624)
- Fix edge case of AUROC with
average=weighted
on GPU (#606) - Fixed
forward
in compositional metrics (#645)
- Added audio metrics:
- Added Information retrieval metrics:
- Added NLP metrics:
- Added other metrics:
- Added
MAP
(mean average precision) metric to new detection package (#467) - Added support for float targets in
nDCG
metric (#437) - Added
average
argument toAveragePrecision
metric for reducing multi-label and multi-class problems (#477) - Added
MultioutputWrapper
(#510) - Added metric sweeping:
- Added simple aggregation metrics:
SumMetric
,MeanMetric
,CatMetric
,MinMetric
,MaxMetric
(#506) - Added pairwise submodule with metrics (#553)
pairwise_cosine_similarity
pairwise_euclidean_distance
pairwise_linear_similarity
pairwise_manhatten_distance
AveragePrecision
will now as default output themacro
average for multilabel and multiclass problems (#477)half
,double
,float
will no longer change the dtype of the metric states. Usemetric.set_dtype
instead (#493)- Renamed
AverageMeter
toMeanMetric
(#506) - Changed
is_differentiable
from property to a constant attribute (#551) ROC
andAUROC
will no longer throw an error when either the positive or negative class is missing. Instead return 0 score and give a warning
- Deprecated
functional.self_supervised.embedding_similarity
in favour of new pairwise submodule
- Removed
dtype
property (#493)
- Fixed bug in
F1
withaverage='macro'
andignore_index!=None
(#495) - Fixed bug in
pit
by using the returned first result to initialize device and type (#533) - Fixed
SSIM
metric using too much memory (#539) - Fixed bug where
device
property was not properly update when metric was a child of a module (#542)
- Added
device
anddtype
properties (#462) - Added
TextTester
class for robustly testing text metrics (#450)
- Added support for float targets in
nDCG
metric (#437)
- Removed
rouge-score
as dependency for text package (#443) - Removed
jiwer
as dependency for text package (#446) - Removed
bert-score
as dependency for text package (#473)
- Fixed ranking of samples in
SpearmanCorrCoef
metric (#448) - Fixed bug where compositional metrics where unable to sync because of type mismatch (#454)
- Fixed metric hashing (#478)
- Fixed
BootStrapper
metrics not working on GPU (#462) - Fixed the semantic ordering of kernel height and width in
SSIM
metric (#474)
- Added Text-related (NLP) metrics:
- Added
MetricTracker
wrapper metric for keeping track of the same metric over multiple epochs (#238) - Added other metrics:
- Added support in
nDCG
metric for target with values larger than 1 (#349) - Added support for negative targets in
nDCG
metric (#378) - Added
None
as reduction option inCosineSimilarity
metric (#400) - Allowed passing labels in (n_samples, n_classes) to
AveragePrecision
(#386)
- Moved
psnr
andssim
fromfunctional.regression.*
tofunctional.image.*
(#382) - Moved
image_gradient
fromfunctional.image_gradients
tofunctional.image.gradients
(#381) - Moved
R2Score
fromregression.r2score
toregression.r2
(#371) - Pearson metric now only store 6 statistics instead of all predictions and targets (#380)
- Use
torch.argmax
instead oftorch.topk
whenk=1
for better performance (#419) - Moved check for number of samples in R2 score to support single sample updating (#426)
- Rename
r2score
>>r2_score
andkldivergence
>>kl_divergence
infunctional
(#371) - Moved
bleu_score
fromfunctional.nlp
tofunctional.text.bleu
(#360)
- Removed restriction that
threshold
has to be in (0,1) range to support logit input ( #351 #401) - Removed restriction that
preds
could not be bigger thannum_classes
to support logit input (#357) - Removed module
regression.psnr
andregression.ssim
(#382): - Removed (#379):
- function
functional.mean_relative_error
num_thresholds
argument inBinnedPrecisionRecallCurve
- function
- Fixed bug where classification metrics with
average='macro'
would lead to wrong result if a class was missing (#303) - Fixed
weighted
,multi-class
AUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (#376) - Fixed that
_forward_cache
and_computed
attributes are also moved to the correct device if metric is moved (#413) - Fixed calculation in
IoU
metric when usingignore_index
argument (#328)
- Fixed DDP by
is_sync
logic toMetric
(#339)
- Added Image-related metrics:
- Added Audio metrics: SNR, SI_SDR, SI_SNR (#292)
- Added other metrics:
- Added
add_metrics
method toMetricCollection
for adding additional metrics after initialization (#221) - Added pre-gather reduction in the case of
dist_reduce_fx="cat"
to reduce communication cost (#217) - Added better error message for
AUROC
whennum_classes
is not provided for multiclass input (#244) - Added support for unnormalized scores (e.g. logits) in
Accuracy
,Precision
,Recall
,FBeta
,F1
,StatScore
,Hamming
,ConfusionMatrix
metrics (#200) - Added
squared
argument toMeanSquaredError
for computingRMSE
(#249) - Added
is_differentiable
property toConfusionMatrix
,F1
,FBeta
,Hamming
,Hinge
,IOU
,MatthewsCorrcoef
,Precision
,Recall
,PrecisionRecallCurve
,ROC
,StatScores
(#253) - Added
sync
andsync_context
methods for manually controlling when metric states are synced (#302)
- Forward cache is reset when
reset
method is called (#260) - Improved per-class metric handling for imbalanced datasets for
precision
,recall
,precision_recall
,fbeta
,f1
,accuracy
, andspecificity
(#204) - Decorated
torch.jit.unused
toMetricCollection
forward (#307) - Renamed
thresholds
argument to binned metrics for manually controlling the thresholds (#322) - Extend typing (#324, #326, #327)
- Deprecated
functional.mean_relative_error
, usefunctional.mean_absolute_percentage_error
(#248) - Deprecated
num_thresholds
argument inBinnedPrecisionRecallCurve
(#322)
- Removed argument
is_multiclass
(#319)
- AUC can also support more dimensional inputs when all but one dimension are of size 1 (#242)
- Fixed
dtype
of modular metrics after reset has been called (#243) - Fixed calculation in
matthews_corrcoef
to correctly match formula (#321)
- Added
is_differentiable
property:
MetricCollection
should return metrics with prefix onitems()
,keys()
(#209)- Calling
compute
beforeupdate
will now give warning (#164)
- Removed
numpy
as direct dependency (#212)
- Fixed auc calculation and add tests (#197)
- Fixed loading persisted metric states using
load_state_dict()
(#202) - Fixed
PSNR
not working withDDP
(#214) - Fixed metric calculation with unequal batch sizes (#220)
- Fixed metric concatenation for list states for zero-dim input (#229)
- Fixed numerical instability in
AUROC
metric for large input (#230)
- Added
BootStrapper
to easily calculate confidence intervals for metrics (#101) - Added Binned metrics (#128)
- Added metrics for Information Retrieval ((PL^5032)):
- Added other metrics:
- Added
average='micro'
as an option in AUROC for multilabel problems (#110) - Added multilabel support to
ROC
metric (#114) - Added testing for
half
precision (#77, #135 ) - Added
AverageMeter
for ad-hoc averages of values (#138) - Added
prefix
argument toMetricCollection
(#70) - Added
__getitem__
as metric arithmetic operation (#142) - Added property
is_differentiable
to metrics and test for differentiability (#154) - Added support for
average
,ignore_index
andmdmc_average
inAccuracy
metric (#166) - Added
postfix
arg toMetricCollection
(#188)
- Changed
ExplainedVariance
from storing all preds/targets to tracking 5 statistics (#68) - Changed behaviour of
confusionmatrix
for multilabel data to better matchmultilabel_confusion_matrix
from sklearn (#134) - Updated FBeta arguments (#111)
- Changed
reset
method to usedetach.clone()
instead ofdeepcopy
when resetting to default (#163) - Metrics passed as dict to
MetricCollection
will now always be in deterministic order (#173) - Allowed
MetricCollection
pass metrics as arguments (#176)
- Rename argument
is_multiclass
->multiclass
(#162)
- Prune remaining deprecated (#92)
- Fixed when
_stable_1d_sort
to work whenn>=N
(PL^6177) - Fixed
_computed
attribute not being correctly reset (#147) - Fixed to Blau score (#165)
- Fixed backwards compatibility for logging with older version of pytorch-lightning (#182)
- Decoupled PL dependency (#13)
- Refactored functional - mimic the module-like structure: classification, regression, etc. (#16)
- Refactored utilities - split to topics/submodules (#14)
- Refactored
MetricCollection
(#19)
- Added
Accuracy
metric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using thetop_k
parameter (PL^4838) - Added
Accuracy
metric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with thesubset_accuracy
parameter (PL^4838) - Added
HammingDistance
metric to compute the hamming distance (loss) (PL^4838) - Added
StatScores
metric to compute the number of true positives, false positives, true negatives and false negatives (PL^4839) - Added
R2Score
metric (PL^5241) - Added
MetricCollection
(PL^4318) - Added
.clone()
method to metrics (PL^4318) - Added
IoU
class interface (PL^4704) - The
Recall
andPrecision
metrics (and their functional counterpartsrecall
andprecision
) can now be generalized to Recall@K and Precision@K with the use oftop_k
parameter (PL^4842) - Added compositional metrics (PL^5464)
- Added AUC/AUROC class interface (PL^5479)
- Added
QuantizationAwareTraining
callback (PL^5706) - Added
ConfusionMatrix
class interface (PL^4348) - Added multiclass AUROC metric (PL^4236)
- Added
PrecisionRecallCurve, ROC, AveragePrecision
class metric (PL^4549) - Classification metrics overhaul (PL^4837)
- Added
F1
class metric (PL^4656) - Added metrics aggregation in Horovod and fixed early stopping (PL^3775)
- Added
persistent(mode)
method to metrics, to enable and disable metric states being added tostate_dict
(PL^4482) - Added unification of regression metrics (PL^4166)
- Added persistent flag to
Metric.add_state
(PL^4195) - Added classification metrics (PL^4043)
- Added new Metrics API. (PL^3868, PL^3921)
- Added EMB similarity (PL^3349)
- Added SSIM metrics (PL^2671)
- Added BLEU metrics (PL^2535)