What is the difference between Sigmoid and Softmax?
I'm trying to train a neural network that works for binary classification problem. What is the difference between having a single output with sigmoid and 2 outputs with softmax functions?
I'm trying to train a neural network that works for binary classification problem. What is the difference between having a single output with sigmoid and 2 outputs with softmax functions?
It does not matter. You can have both.
In general, people use the Sigmoid activation function when they have 2 classes to predict so having a Sigmoid on top of your network seems a good idea.