0

The paper states that:
We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which show that the learned over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation (LoRA) approach.


Linear algebra tells us that any rectangular matrix *m x n* of rank *k* can be written as the product of two matrix A, B of dimension *m x k* and *k x n* respectively. This does not state that the matrix must have a low rank. This leads me to the question, why did the paper explicitly mention "low intrinsic rank"? Yes a low rank would vastly reduce the number of parameters but that isn't a requirement for the construct that $ \Delta {w}$ can be decomposed.
xkcd101
  • 9
  • 1

0 Answers0