Loss term for machine learning / neural nets.
Tensorflow asks for from_logits
on Binary Crossentropy. The documentation states:
Whether to interpret y_pred as a tensor of logit values. By default, we assume that y_pred contains probabilities (i.e., values in [0, 1]). Note: Using from_logits=True may be more numerically stable.
If the input to Binary Crossentropy is not passed through e.g. Sigmoid or Softmax, it is from logits. This means, the "raw" output values of the model.
In oposition, after Sigmoid activation or Softmax (and maybe some others) the values represent actual probabilities, meaning they sum up to
This can sometimes be a bit confusing, when Sigmoid activation is seen as a functional part of the last layer and not as a scaling / normalizing function for the purpose of getting probabilities which sum up to
In Deep Learning, logits usually and unfortunately means the ‘raw’ outputs of the last layer of a classification network, that is, the output of the layer before it is passed to an activation/normalization function, e.g. the sigmoid. Raw outputs may take on any value.
Because of the underlying calculations there can be numerical instabilities in the combination of Sigmoid and Binary Crossentropy, if done wrong. Thus, Tensorflow will actually ignore Sigmoid activation in the last layer and combine the calculation of Sigmoid and Binary Crossentropy internally.
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/python/keras/losses.py#L348-L406