Highest Voted 'vanishing-gradient-problem' Questions - Artificial Intelligence Stack Exchange

7

votes

1 answer

Why do ResNets avoid the vanishing gradient problem?

I read that, if we use the sigmoid or hyperbolic tangent activation functions in deep neural networks, we can have some problems with the vanishing of the gradient, and this is visible by the shapes of the derivative of these functions. ReLU solves…

asked Jan 30 '20 at 15:09

FraMan

199
1
4
11

6

votes

1 answer

If vanishing gradients are NOT the problem that ResNets solve, then what is the explanation behind ResNet success?

I often see blog posts or questions on here starting with the premise that ResNets solve the vanishing gradient problem. The original 2015 paper contains the following passage in section 4.1: We argue that this optimization difficulty is unlikely…

convolutional-neural-networks papers residual-networks vanishing-gradient-problem

asked Mar 18 '20 at 07:52

Alexander Soare

1,379
3
12
28

5

votes

2 answers

What are the common pitfalls that we could face when training neural networks?

Apart from the vanishing or exploding gradient problems, what are other problems or pitfalls that we could face when training neural networks?

neural-networks vanishing-gradient-problem exploding-gradient-problem

asked May 04 '20 at 14:29

pjoter

51
1

5

votes

1 answer

What effect does batch norm have on the gradient?

Batch norm is a technique where they essentially standardize the activations at each layer, before passing it on to the next layer. Naturally, this will affect the gradient through the network. I have seen the equations that derive the…

deep-learning optimization batch-normalization vanishing-gradient-problem exploding-gradient-problem

asked Mar 27 '20 at 18:28

information_interchange

339
1
10

5

votes

1 answer

How to detect vanishing gradients?

Can vanishing gradients be detected by the change in distribution (or lack thereof) of my convolution's kernel weights throughout the training epochs? And if so how? For example, if only 25% of my kernel's weights ever change throughout the epochs,…

deep-learning convolutional-neural-networks deep-neural-networks vanishing-gradient-problem

asked Feb 19 '20 at 19:12

Elegant Code

153
1
7

4

votes

0 answers

Why does sigmoid saturation prevent signal flow through the neuron?

As per these slides on page 35: Sigmoids saturate and kill gradients. when the neuron's activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. the gradient and almost no signal will flow through the neuron…

neural-networks backpropagation weights sigmoid vanishing-gradient-problem

asked Jan 31 '21 at 20:56

EEAH

193
1
5

3

votes

3 answers

Why is the vanishing gradient problem especially relevant for a RNN and not a MLP

I would like to know why the vanishing gradient problem especially relevant for a RNN and not a MLP (multi-layer-pereptron). In a MLP you also backpropagate errors and multiple different weigths. If the weights are small, the resulting update in the…

recurrent-neural-networks vanishing-gradient-problem

asked Jan 09 '24 at 16:36

PeterBe

276
3
14

3

votes

1 answer

Why aren't artificial derivatives used more often to solve the vanishing gradient problem?

While looking into the vanishing gradient problem, I came across a paper (https://ieeexplore.ieee.org/abstract/document/9336631) that used artificial derivatives in lieu of the real derivatives. For a visualization, see the attached image: As you…

activation-functions vanishing-gradient-problem

asked Oct 09 '22 at 12:09

postnubilaphoebus

356
2
13

3

votes

0 answers

Would a different learning rate for every neuron and layer mitigate or solve the vanishing gradient problem?

I'm interested in using the sigmoid (or tanh) activation function instead of RELU. I'm aware of RELU advantages on faster computation and no vanishing gradient problem. But about vanishing gradient, the main problem is about the backpropagation…

deep-learning backpropagation activation-functions learning-rate vanishing-gradient-problem

asked Aug 06 '20 at 08:30

Rogelio Triviño

141
3

3

votes

3 answers

How do LSTM and GRU avoid to overcome the vanishing gradient problem?

I'm watching the video Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka where the author says that the LSTM and GRU architecture help to reduce the vanishing gradient problem. How do LSTM and GRU…

natural-language-processing long-short-term-memory vanishing-gradient-problem

asked Apr 06 '20 at 19:42

DRV

1,843
3
15
20

2

votes

1 answer

Why are all the gradients values 0 except for the first iteration?

I am fine-tuning a mistral-7b with Hugging Face peft and quantization. In my training loop, I am printing the gradient values for each batch, which seem a bit unusual. # Print gradients for name, param in model_init.named_parameters(): if…

deep-learning pytorch fine-tuning vanishing-gradient-problem

asked Jun 20 '24 at 00:14

kms

121
3

2

votes

1 answer

Can attention models be replaced by non-sigmoid activation functions?

As far as I understand, the attention model in a LLM is used to mitigate the vanishing gradient problem. When using activation functions like the sigmoid function, deep neural networks may lead to gradients that are very close to zero (because you…

attention large-language-models vanishing-gradient-problem

asked May 05 '24 at 13:14

A. Darwin

163
3

2

votes

1 answer

How does vanish gradient restrict RNN to not work for long range dependencies?

I am really trying to understand deep learning models like RNN, LSTMs etc. I have gone through many tutorials of RNN and have learned that RNN cannot work for long Range dependencies, like: Consider trying to predict the last word in the text “I…

deep-learning recurrent-neural-networks gradient-descent vanishing-gradient-problem

asked Oct 21 '20 at 12:11

Nafees Ahmed

41
3

2

votes

0 answers

How to decide if gradients are vanishing?

I am trying to debug a convolutional neural network. I am seeing gradients close to zero. How can I decide whether these gradients are vanishing or not? Is there some threshold to decide on vanishing gradient by looking at the values? I am getting…

convolutional-neural-networks activation-functions relu vanishing-gradient-problem adam

asked Oct 20 '20 at 05:22

pramesh

121
4

1

vote

1 answer

Might use of rational numbers and calculations be beneficial for an ANN?

Rational numbers would help alleviate some gradient issues by not losing precision as the weights and the propagated values (signal) reach extremely low and high values. I'm not aware of any hardware that is optimized for rationals. GPUs are all…

weights gradient vanishing-gradient-problem precision

asked Mar 27 '24 at 18:45

Mark_Sagecy

133
4

Questions tagged [vanishing-gradient-problem]