I'm using Keras with TensorFlow backend and using cross-entropy as the loss function. I saw that TensorFlow requires pre softmax as an input for the loss, but I put the output of the softmax in Keras.
What is the right way to use it?

1 Answers

I usually create the inverse function of the output, so I can design my network with softmax of sigmoid but will use the inverse version of it when I need to pass them to the loss function.

In general, cross-entropy with logits works better than with softmax output. Also, the mixed-precision regime almost always requires you to use loss with the logits version.

