The paper states that:
We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which show that the learned
over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the
change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed
Low-Rank Adaptation (LoRA) approach.
Linear algebra tells us that any rectangular matrix *m x n* of rank *k* can be written as the product of two matrix A, B of dimension *m x k* and *k x n* respectively. This does not state that the matrix must have a low rank. This leads me to the question, why did the paper explicitly mention "low intrinsic rank"? Yes a low rank would vastly reduce the number of parameters but that isn't a requirement for the construct that $ \Delta {w}$ can be decomposed.