Why is the sinusoidal model classified as absolute positional encoding in some literature?

Question

I am currently reading in depth about positional encodings, and as we know there are two types of positional encodings: Absolute and relative.

My question:

Why is the sinusoidal model classified as absolute positional encoding in some literature, given that in Vaswani's original paper it was said that it captures relative relationships between words, and this has been proven here.

However, while I was reading a research, it was mentioned that projections that occur in the attention layer destroy this:

Indeed, sinusoidal position embeddings exhibit useful properties in theory. Yan et al. (2019) investigate the dot product of sinusoidal position embeddings and prove important properties: (1) The dot product of two sinusoidal position embeddings depends only on their relative distance. That is, $P_t^T.P_{t+r}$ is independent of . (2) $P_t^T.P_{t+r}=P_t^T.P_{t-r}$ , which means that sinusoidal position embeddings are unaware of direction. However, in practice the sinusoidal embeddings are projected with two different projection matrices, which destroys these properties.

Is this the reason?

score 1 · Accepted Answer · answered Jan 19 '24 at 19:35

Absolute position embeddings capture the absolute location of a token. Absolute location would refer to e.g., the 1st, 2nd, 3rd token etc. The sinusoidal embeddings in Vaswani's paper capture this absolute position information. But, if you have absolute position encodings, then you can also always derive relative positions, and sinusoidal embeddings make that really easy, but because absolute position is encoded, it would be considered an absolute position encoding.

Contrast that with relative position encoding, where only the relative position between two tokens is used. e.g., in the paper, the embeddings are only used during the attention operation, and they only capture information about the distance between two tokens.

Why is the sinusoidal model classified as absolute positional encoding in some literature?

My question:

1 Answers1