I am currently reading in depth about positional encodings, and as we know there are two types of positional encodings: Absolute and relative.
My question:
Why is the sinusoidal model classified as absolute positional encoding in some literature, given that in Vaswani's original paper it was said that it captures relative relationships between words, and this has been proven here.
However, while I was reading a research, it was mentioned that projections that occur in the attention layer destroy this:
Indeed, sinusoidal position embeddings exhibit useful properties in theory. Yan et al. (2019) investigate the dot product of sinusoidal position embeddings and prove important properties: (1) The dot product of two sinusoidal position embeddings depends only on their relative distance. That is,
is independent of
. (2)
, which means that sinusoidal position embeddings are unaware of direction. However, in practice the sinusoidal embeddings are projected with two different projection matrices, which destroys these properties.
Is this the reason?
is independent of
.
(2)
, which means that sinusoidal position embeddings are unaware of direction. However, in practice the sinusoidal embeddings are projected with two different projection matrices, which destroys these properties.