Questions tagged [positional-encoding]
25 questions
8
votes
2 answers
What is the difference between the positional encoding techniques of the Transformer and GPT?
I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques.
More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much.
How…
Leevo
- 305
- 2
- 9
3
votes
1 answer
Positional Encoding of Time-Series features
I’m trying to use a Transformer Encoder I coded with weather feature vectors which are basically 11 features about the weather in the dimension [batch_size, n_features].
I have a data point per day, so this is a time-series but there are no…
Ouilliam
- 31
- 1
- 2
3
votes
2 answers
Is there a notion of location in Transformer architecture in subsequent self-attention layers?
Transformer architecture (without position embedding) is by the very construction equivariant to the permutation of tokens. Given query $Q \in \mathbb{R}^{n \times d}$ and keys $K \in \mathbb{R}^{n \times d}$ and some permutation matrix $P \in…
spiridon_the_sun_rotator
- 2,852
- 12
- 17
3
votes
0 answers
Is there any point in adding the position embedding to the class token in Transformers?
The popular implementations of ViTs by Ross Wightman and Phil Wang add the position embedding to the class tokens as well as to the patches.
Is there any point in doing so?
The purpose of introduction positional embeddings to the Transformer is…
spiridon_the_sun_rotator
- 2,852
- 12
- 17
3
votes
0 answers
How does positional encoding work in the transformer model?
In the transformer model, to incorporate positional information of texts, the researchers have added a positional encoding to the model. How does positional encoding work? How does the positional encoding system learn the positions when varying…
Eka
- 1,106
- 8
- 24
2
votes
1 answer
Why is the sinusoidal model classified as absolute positional encoding in some literature?
I am currently reading in depth about positional encodings, and as we know there are two types of positional encodings: Absolute and relative.
My question:
Why is the sinusoidal model classified as absolute positional encoding in some literature,…
Ali Haider Ahmad
- 23
- 3
2
votes
1 answer
Which positional encoding BERT use?
It is a little bit confusing that someone is explaining that BERT is using sinusoidal functions for BERT position encoding and someone is saying BERT just uses absolute position.
I checked that Vaswani 2017 et al., used a sinusoidal function for…
yoon
- 121
- 1
- 3
2
votes
1 answer
Is Positional Encoding always needed for using Transformer models correctly?
I am trying to make a model that uses a Transformer to see the relationship between several data vectors, but the order of the data is not relevant in this case, so I am not using the Positional Encoding.
Since the performance of models using…
Angelo
- 211
- 2
- 17
2
votes
0 answers
Positional Encoding in Transformer on multi-variate time series data hurts performance
I set up a transformer model that embeds positional encodings in the encoder. The data is multi-variate time series-based data.
As I just experiment with the positional encoding portion of the code I set up a toy model: I generated a time series…
Matt
- 121
- 1
2
votes
0 answers
Why do both sine and cosine have been used in positional encoding in the transformer model?
The Transformer model proposed in "Attention Is All You Need" uses sinusoid functions to do the positional encoding.
Why have both sine and cosine been used? And why do we need to separate the odd and even dimensions to use different sinusoid…
Shiyu
- 21
- 1
2
votes
0 answers
How do the sine and cosine functions encode position in the transformer?
After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence.
Is it the fact that since each row…
shoshi
- 121
- 3
1
vote
2 answers
Why do we need cosine positional encoding in multi-head attention based transformer?
My understanding is that all tokens are passed to a transformer at once, and positional encodings help it understand their order in the sequence. And cosine type of positional encoding helps capture the short-term and long-term dependencies between…
user9343456
- 181
- 3
1
vote
1 answer
If masked attention tells which token precedes which token, why are positional embeddings required?
I'm learning about transformers and their components, and I understand that:
Masked attention ensures that each token can only attend to previous tokens (or itself) in causal language models, effectively encoding the order of tokens.
Positional…
Saurabh Patel
- 21
- 2
1
vote
1 answer
What is the intuition behind position-encoding?
It is clear that word positions are essential for the meaning of a sentence, and so are essential when feeding a sentence (= sequence of words) as a matrix of word embedding vectors into a transformer. I also have understood roughly how positions…
Hans-Peter Stricker
- 931
- 1
- 8
- 23
1
vote
0 answers
How can the Transformer model tell from positional encoding data to the origional data?
I am having trouble understanding positional encoding. Say after the wor2vec or some encoding algo we get the tensor $[0.7, 0.4, 0.2]$ for the second position. Now the final input into the model would add a positional encoding, making it $[0.7 +…
BlueSnake
- 67
- 1
- 5