Concatenating the positional encoding

Asked Jan 04 '24 at 14:44

Active Jan 04 '24 at 22:17

Viewed 109 times

As per "Attention is all you need" etc., positional encoding is added to the embedded word vector input. My knee-jerk reaction is that this would muddle the "signal" of the word vector. Since word order is not preserved, this additive noise, for instance, could make a word look like a different word (in a different position): $w_a + p_a = x = w_b + p_b$.

Is performance (from shorter input dimension) the main reason to add the positional information versus concatenating, or is adding theoretically sound?

Edit: I think a source of confusion in all of this is the assumption of whether the model adopts a pre-trained word embedder, like Word2Vec, or trains their embedder from scratch. Adding positional encoding to a pre-trained embedder might throw off the attention dot product?

edited Jan 04 '24 at 22:17

asked Jan 04 '24 at 14:44

SuaveSouris

Concatenating the positional encoding

0 Answers0