You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current code optimize temperature parameter with respect to nll_criterion, is it more reasonable to optimize with respect to ece_criterion in order to obtain a well-calibrated model?
Thanks!
The text was updated successfully, but these errors were encountered:
NLL and brier scores are both "proper scoring rules" for calibration while ECE is not. This means that these losses (NLL & brier) are minimized if and only if the model recovers the true ground truth probability distribution (this is mentioned in the temperature scaling paper and discussed more in depth in "Elements of Statistical Learning" by Friedman).
ECE is not a proper scoring rule because, as explained in Can You Trust Your Model's Uncertainty? "there exist trivial solutions which yield optimal scores; for example, returning the marginal probability p(y) for every instance will yield perfectly calibrated but uninformative predictions."
Bottomline: ECE is useful because it is easily interpreted, but often shouldn't be used to train you network.
The current code optimize temperature parameter with respect to nll_criterion, is it more reasonable to optimize with respect to ece_criterion in order to obtain a well-calibrated model?
Thanks!
The text was updated successfully, but these errors were encountered: