For questions about the sigmoid functions (in particular, the logistic functions) and the consequences of using them as activation functions in neural networks.
Questions tagged [sigmoid]
38 questions
10
votes
3 answers
Are ReLUs incapable of solving certain problems?
Background
I've been interested in and reading about neural networks for several years, but I haven't gotten around to testing them out until recently.
Both for fun and to increase my understanding, I tried to write a class library from scratch in…
Benjamin Chambers
- 221
- 1
- 8
9
votes
1 answer
What happens when I mix activation functions?
There are several activation functions, such as ReLU, sigmoid or $\tanh$. What happens when I mix activation functions?
I recently found that Google has developed Swish activation function which is (x*sigmoid). By altering activation function can it…
JSChang
- 93
- 1
- 6
7
votes
1 answer
How is division by zero avoided when implementing back-propagation for a neural network with sigmoid at the output neuron?
I am building a neural network for which I am using the sigmoid function as the activation function for the single output neuron at the end. Since the sigmoid function is known to take any number and return a value between 0 and 1, this is causing…
Dimitry
- 73
- 1
- 3
7
votes
4 answers
What does "e" do in the Sigmoid Activation Function?
Within the Sigmoid Squishification function,
f(x) = 1/(1 + e^(-x))
"e" is unnecessary, as it can be replaced by any other value that is not 0 or 1. Why is "e" used here?
As shown below, the function is working well without that, and in replacement,…
Jake
- 181
- 4
4
votes
1 answer
Why is it a problem if the outputs of an activation function are not zero-centered?
In this lecture, the professor says that one problem with the sigmoid function is that its outputs aren't zero-centered. Are the explanation provided by the professor regarding why this is bad is that the gradient of our loss w.r.t. the weights…
Daviiid
- 585
- 5
- 17
4
votes
0 answers
Why does sigmoid saturation prevent signal flow through the neuron?
As per these slides on page 35:
Sigmoids saturate and kill gradients.
when the neuron's activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero.
the gradient and almost no signal will flow through the neuron…
EEAH
- 193
- 1
- 5
4
votes
1 answer
Neural network doesn't seem to converge with ReLU but it does with Sigmoid?
I'm not really sure if this is the sort of question to ask on here, since it is less of a general question about AI and more about the coding of it, however I thought it wouldn't fit on stack overflow.
I have been programming a multilayer perceptron…
finlay morrison
- 151
- 4
4
votes
1 answer
Can neural networks with a sigmoid as the activation function of the output layer approximate continuous functions?
Neural networks are commonly used for classification tasks, in fact from this post it seems like that's where they shine brightest.
However, when we want to classify using neural networks, we often have the output layer to take values in $[0,1]$;…
AB_IM
- 634
- 1
- 7
- 15
3
votes
3 answers
Why is there tanh(x)*sigmoid(x) in a LSTM cell?
CONTEXT
I was wondering why there are sigmoid and tanh activation functions in an LSTM cell.
My intuition was based on the flow of tanh(x)*sigmoid(x)
and the derivative of tanh(x)*sigmoid(x)
It seems to me that authors wanted to choose such a…
MASTER OF CODE
- 242
- 2
- 9
3
votes
1 answer
Accuracy dropped when I ran the program the second time
I was following a tutorial about Feed-Forward Networks and wrote this code for a simple FFN :
class FirstFFNetwork:
#intialize the parameters
def __init__(self):
self.w1 = np.random.randn()
self.w2 = np.random.randn()
self.w3 =…
Eeshaan Jain
- 31
- 2
2
votes
1 answer
How do I avoid the "math domain error" when the input to the log is zero in the objective function of a neural network?
I am implementing a neural network to train it on handwritten digits.
Here is the cost function that I am implementing.
$$J(\Theta)=-\frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K}\left[y_{k}^{(i)} \log…
Gokulakannan
- 73
- 5
2
votes
2 answers
When should I use a linear unit instead of sigmoid in the output layer?
In which types of learning tasks are linear units more useful than sigmoid activation functions in the output layer of a multi-layer neural network?
DSPinfinity
- 1,223
- 4
- 10
2
votes
2 answers
How would you go from 1 to k hidden layers in Cybenko's result that neural networks are universal approximators?
Cybenko showed that if $\sigma$ is a sigmoidal, continuous function, then for any $\varepsilon > 0$, for any continuous function $f: [0, 1]^d \to \mathbb{R}$, there exists a function of the form $g:x \mapsto \sum\limits_{i = 1}^n a_i\sigma\left(…
JackEight
- 123
- 3
2
votes
1 answer
If the output is 0.09, does this mean that the prediction is class 1 or 0?
I use a Keras EfficientNetB7 and transfer learning to solve a binary classification problem. I use tf.keras.layers.Dense(1, activation="sigmoid")(x) for my final layer.
My labels are encoded as the following for the model.fit():
[[1.]
[1.]
[0.]
…
Doug
- 125
- 3
2
votes
1 answer
How do sigmoid functions make it so that the prediction $\hat{y}$ indicates the probability that the observed value, $y$, is $1$?
I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.3 Choice of Activation and Loss Functions says the following:
The choice of activation function is a critical part of neural network design.…
The Pointer
- 611
- 5
- 22