For questions about the concept of momentum in the context of gradient descent.
Questions tagged [momentum]
7 questions
9
votes
1 answer
What is the formula for the momentum and Adam optimisers?
In the gradient descent algorithm, the formula to update the weight $w$, which has $g$ as the partial gradient of the loss function with respect to it, is:
$$w\ -= r \times g$$
where $r$ is the learning rate.
What should be the formula for momentum…
Dan D
- 1,318
- 1
- 14
- 39
3
votes
3 answers
Why must the momentum factor be in the range 0-1?
Why is it a bad idea to have a momentum factor greater than 1? What are the mathematical motivations/reasons?
Ameba kupiec
- 61
- 1
- 3
3
votes
1 answer
Different Definitions of Momentum -- which one should I work with?
I'm seeing different manners to define momentum, I'm not sure if there is significant difference or not.
From my thinking, they seem to do a similar thing mathematically and in practice but I'm curious to know if there's a significant difference…
vxnuaj
- 125
- 1
- 6
3
votes
1 answer
How are these equations of SGD with momentum equivalent?
I know this question may be so silly, but I can not prove it.
In Stanford slide (page 17), they define the formula of SGD with momentum like this:
$$
v_{t}=\rho v_{t-1}+\nabla f(x_{t-1})
\\
x_{t}=x_{t-1}-\alpha v_{t},
$$
where:
$v_{t+1}$ is the…
CuCaRot
- 932
- 5
- 16
1
vote
3 answers
At which point, does the momentum based GD helps really in this figure?
Classical gradient descent algorithms sometimes overshoot and escape minima as they depend on the gradient only. You can see such a problem during the update from point 6.
In classical GD algorithm, the update equation is
$$\theta_{t+1} =…
hanugm
- 4,102
- 3
- 29
- 63
1
vote
0 answers
Why do momentum techniques not work well for RNNs?
AFAIK, momentum is quite useful when training CNNs, and can speed-up the training substantially without any drop in validation accuracy.
I've recently learned that it is not as helpful for RNNs, where plain SGD is preferred.
For example, Deep…
SpiderRico
- 1,040
- 10
- 18
0
votes
0 answers
Have Cyclic learning rates in general paid off for machine learning models performance?
Reading about learning rates I had the idea that Cyclic LRs could be interesting.
It's likely one could justify either way:
that they'd kick you off the minimum plus that saddle points are solved with Momentum based optimiser and make Cyclic LRs…
user88930