I'm seeing different manners to define momentum, I'm not sure if there is significant difference or not.
From my thinking, they seem to do a similar thing mathematically and in practice but I'm curious to know if there's a significant difference that i'm not seeing.
1st Definition
Momentum: $v\theta_{t} = \beta * v\theta_{t-1} + (1 - \beta) * ∂\theta_t$
Update: $\theta = \theta - \alpha * v\theta$
2nd Definition
Momentum: $v\theta_t = \beta * v\theta_{t-1} - \alpha *∂\theta_t$
Update: $\theta = \theta + v\theta_t$
3rd Definition
Momentum: $v\theta_t = \beta * v\theta_{t-1} + ∂\theta_t$
Update: $\theta = \theta - \alpha * v\theta_t$
From my understanding, the first definition explicitly uses the exponentially weighted average with the $\beta$ hyperparameter, which then can increase the stability of the model when using momentum and makes $\beta$ more intuitive to tune.
I'm not sure why I'd want to use the 2nd or 3rd definitions over the 1st, nor if they have any significant difference.