Highest Voted 'momentum' Questions - Artificial Intelligence Stack Exchange

9

votes

1 answer

What is the formula for the momentum and Adam optimisers?

In the gradient descent algorithm, the formula to update the weight $w$, which has $g$ as the partial gradient of the loss function with respect to it, is: $$w\ -= r \times g$$ where $r$ is the learning rate. What should be the formula for momentum…

asked Jan 13 '20 at 07:04

Dan D

1,318
1
14
39

3

votes

3 answers

Why must the momentum factor be in the range 0-1?

Why is it a bad idea to have a momentum factor greater than 1? What are the mathematical motivations/reasons?

neural-networks gradient-descent hyper-parameters multilayer-perceptrons momentum

asked May 12 '18 at 04:31

Ameba kupiec

61
1
3

3

votes

1 answer

Different Definitions of Momentum -- which one should I work with?

I'm seeing different manners to define momentum, I'm not sure if there is significant difference or not. From my thinking, they seem to do a similar thing mathematically and in practice but I'm curious to know if there's a significant difference…

optimization gradient-descent momentum

asked Jul 03 '24 at 16:22

vxnuaj

125
1
6

3

votes

1 answer

How are these equations of SGD with momentum equivalent?

I know this question may be so silly, but I can not prove it. In Stanford slide (page 17), they define the formula of SGD with momentum like this: $$ v_{t}=\rho v_{t-1}+\nabla f(x_{t-1}) \\ x_{t}=x_{t-1}-\alpha v_{t}, $$ where: $v_{t+1}$ is the…

deep-learning comparison optimization stochastic-gradient-descent momentum

asked Dec 13 '20 at 05:56

CuCaRot

932
5
16

1

vote

3 answers

At which point, does the momentum based GD helps really in this figure?

Classical gradient descent algorithms sometimes overshoot and escape minima as they depend on the gradient only. You can see such a problem during the update from point 6. In classical GD algorithm, the update equation is $$\theta_{t+1} =…

gradient-descent momentum

asked May 24 '22 at 23:55

hanugm

4,102
3
29
63

1

vote

0 answers

Why do momentum techniques not work well for RNNs?

AFAIK, momentum is quite useful when training CNNs, and can speed-up the training substantially without any drop in validation accuracy. I've recently learned that it is not as helpful for RNNs, where plain SGD is preferred. For example, Deep…

recurrent-neural-networks gradient-descent stochastic-gradient-descent momentum

asked Mar 09 '20 at 05:31

SpiderRico

1,040
10
18

0

votes

0 answers

Have Cyclic learning rates in general paid off for machine learning models performance?

Reading about learning rates I had the idea that Cyclic LRs could be interesting. It's likely one could justify either way: that they'd kick you off the minimum plus that saddle points are solved with Momentum based optimiser and make Cyclic LRs…

machine-learning deep-learning optimization stochastic-gradient-descent momentum

asked Feb 01 '25 at 16:22

user88930

Questions tagged [momentum]

What is the formula for the momentum and Adam optimisers?

Why must the momentum factor be in the range 0-1?

Different Definitions of Momentum -- which one should I work with?

How are these equations of SGD with momentum equivalent?

At which point, does the momentum based GD helps really in this figure?

Why do momentum techniques not work well for RNNs?

Have Cyclic learning rates in general paid off for machine learning models performance?