You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper mentions that the loss layer is combined with the sigmoid computation and not softmax. More speciafically this line
Finally,
we note that the implementation of the loss layer combines
the sigmoid operation for computing p with the loss computation, resulting in greater numerical stability.
So isn't the author saying that we should use sigmoid activation over the last layer. The softmax usage maybe could lead to a lower accuracy.
The text was updated successfully, but these errors were encountered:
The paper mentions that the loss layer is combined with the sigmoid computation and not softmax. More speciafically this line
So isn't the author saying that we should use sigmoid activation over the last layer. The softmax usage maybe could lead to a lower accuracy.
The text was updated successfully, but these errors were encountered: