9

I've pondered this for a while without developing an intuition for the math behind the cause of this.

So what causes a model to need a low learning rate?

nbro
  • 42,615
  • 12
  • 119
  • 217
JohnAllen
  • 217
  • 1
  • 6

1 Answers1

6

Gradient Descent is a method to find the optimum parameter of the hypothesis or minimize the cost function.

formula where alpha is learning rate

If the learning rate is high then it can overshoot the minimum and can fail to minimize the cost function. enter image description here

hence result in a higher loss.

enter image description here

Since Gradient descent can only find local minimum so, the lower learning rate may result in bad performance. To do so, it is better to start with the random value of the hyperparameter can increase model training time but there are advanced methods such as adaptive gradient descent can manage the training time.

There are lots of optimizer for the same task but no optimizer is perfect. It depends on some factors

  1. size of training data: as the size of the training data increases training time for model increases. If you want to go with less training model time you can choose a higher learning rate but may result in bad performance.
  2. Optimizer(gradient descent) will be slow down whenever the gradient is small then it is better to go with a higher learning rate.

PS. It is always better to go with different rounds of gradient descent

Posi2
  • 368
  • 3
  • 16