1

As the title says my GNN with three layers of GAT (Graph attention layers) is only moving the metrics when the learning rate is 1. As generally the learning rate is (0,1) should I be worried?

Also here it says that if lr is larger than 1 it is solely focusing on the gradient instead of the model parameters. Don't know if this is a good or bad thing. Why is the learning rate generally beneath 1?

DataDoge
  • 11
  • 2

3 Answers3

1

A learning rate of 1 is indeed atypical in many machine learning contexts, as it may cause the model to converge too quickly and overshoot the global minimum.

Generally, a smaller learning rate within the range (0,1) is preferred to allow the model to learn gradually and avoid missing the global minimum. However, the optimal learning rate can depend on the specific architecture and dataset.

If a high learning rate like 1 is working for your GNN with GAT layers, it may indicate that your model and dataset have certain unique characteristics. However, it's also possible that your model could benefit from a lower learning rate and more training epochs.

You may want to experiment with different learning rates and observe the effects on your model's training and validation performance.

Ari Setiawan
  • 121
  • 3
0

From an optimization POV, there is no problem, indeed, given a loss $L$ for which a good learning is $10^{-4}$, you can build a second loss equivalent which has $1$ as good learning rate, just by multiplying by a scalar $L' 10^{-4} L$ ($\nabla L' = 10^{-4}\nabla L$)

In DL might happen that, for example, during regression, the targets and the predictions are very small, so that "scalar" that i was talking about, is hidden in the targets

Alberto
  • 2,863
  • 5
  • 12
0

Learning rate must be below 1 because if it were above 1 the output would only depend on the last input of the last training epoch , which cant be correct.

Root
  • 1