Highest Voted 'positional-encoding' Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

What is the difference between the positional encoding techniques of the Transformer and GPT?

I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques. More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much. How…

asked Nov 23 '20 at 22:03

Leevo

305
2
9

3

votes

1 answer

Positional Encoding of Time-Series features

I’m trying to use a Transformer Encoder I coded with weather feature vectors which are basically 11 features about the weather in the dimension [batch_size, n_features]. I have a data point per day, so this is a time-series but there are no…

transformer time-series positional-encoding

asked May 17 '22 at 12:22

Ouilliam

31
1
2

3

votes

2 answers

Is there a notion of location in Transformer architecture in subsequent self-attention layers?

Transformer architecture (without position embedding) is by the very construction equivariant to the permutation of tokens. Given query $Q \in \mathbb{R}^{n \times d}$ and keys $K \in \mathbb{R}^{n \times d}$ and some permutation matrix $P \in…

transformer attention positional-encoding

asked Oct 31 '21 at 08:10

spiridon_the_sun_rotator

2,852
12
17

3

votes

0 answers

Is there any point in adding the position embedding to the class token in Transformers?

The popular implementations of ViTs by Ross Wightman and Phil Wang add the position embedding to the class tokens as well as to the patches. Is there any point in doing so? The purpose of introduction positional embeddings to the Transformer is…

neural-networks transformer positional-encoding

asked Oct 24 '21 at 19:52

spiridon_the_sun_rotator

2,852
12
17

3

votes

0 answers

How does positional encoding work in the transformer model?

In the transformer model, to incorporate positional information of texts, the researchers have added a positional encoding to the model. How does positional encoding work? How does the positional encoding system learn the positions when varying…

deep-learning natural-language-processing transformer attention positional-encoding

asked Mar 05 '20 at 11:54

Eka

1,106
8
24

2

votes

1 answer

Why is the sinusoidal model classified as absolute positional encoding in some literature?

I am currently reading in depth about positional encodings, and as we know there are two types of positional encodings: Absolute and relative. My question: Why is the sinusoidal model classified as absolute positional encoding in some literature,…

natural-language-processing transformer positional-encoding

asked Jan 05 '24 at 18:30

Ali Haider Ahmad

23
3

2

votes

1 answer

Which positional encoding BERT use?

It is a little bit confusing that someone is explaining that BERT is using sinusoidal functions for BERT position encoding and someone is saying BERT just uses absolute position. I checked that Vaswani 2017 et al., used a sinusoidal function for…

bert positional-encoding

asked Sep 08 '22 at 06:04

yoon

121
1
3

2

votes

1 answer

Is Positional Encoding always needed for using Transformer models correctly?

I am trying to make a model that uses a Transformer to see the relationship between several data vectors, but the order of the data is not relevant in this case, so I am not using the Positional Encoding. Since the performance of models using…

deep-learning natural-language-processing transformer positional-encoding

asked Nov 14 '21 at 19:44

Angelo

211
2
17

2

votes

0 answers

Positional Encoding in Transformer on multi-variate time series data hurts performance

I set up a transformer model that embeds positional encodings in the encoder. The data is multi-variate time series-based data. As I just experiment with the positional encoding portion of the code I set up a toy model: I generated a time series…

tensorflow pytorch transformer time-series positional-encoding

asked Jul 27 '21 at 19:18

Matt

121
1

2

votes

0 answers

Why do both sine and cosine have been used in positional encoding in the transformer model?

The Transformer model proposed in "Attention Is All You Need" uses sinusoid functions to do the positional encoding. Why have both sine and cosine been used? And why do we need to separate the odd and even dimensions to use different sinusoid…

deep-learning papers transformer machine-translation positional-encoding

asked Sep 12 '19 at 02:03

Shiyu

21
1

2

votes

0 answers

How do the sine and cosine functions encode position in the transformer?

After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence. Is it the fact that since each row…

deep-learning natural-language-processing transformer attention positional-encoding

asked Mar 03 '19 at 19:41

shoshi

121
3

1

vote

2 answers

Why do we need cosine positional encoding in multi-head attention based transformer?

My understanding is that all tokens are passed to a transformer at once, and positional encodings help it understand their order in the sequence. And cosine type of positional encoding helps capture the short-term and long-term dependencies between…

transformer positional-encoding

asked Jan 13 '25 at 08:59

user9343456

181
3

1

vote

1 answer

If masked attention tells which token precedes which token, why are positional embeddings required?

I'm learning about transformers and their components, and I understand that: Masked attention ensures that each token can only attend to previous tokens (or itself) in causal language models, effectively encoding the order of tokens. Positional…

transformer attention positional-encoding

asked Nov 25 '24 at 16:48

Saurabh Patel

21
2

1

vote

1 answer

What is the intuition behind position-encoding?

It is clear that word positions are essential for the meaning of a sentence, and so are essential when feeding a sentence (= sequence of words) as a matrix of word embedding vectors into a transformer. I also have understood roughly how positions…

transformer word-embedding positional-encoding

asked May 10 '23 at 14:25

Hans-Peter Stricker

931
1
8
23

1

vote

0 answers

How can the Transformer model tell from positional encoding data to the origional data?

I am having trouble understanding positional encoding. Say after the wor2vec or some encoding algo we get the tensor $[0.7, 0.4, 0.2]$ for the second position. Now the final input into the model would add a positional encoding, making it $[0.7 +…

natural-language-processing transformer positional-encoding

asked Apr 04 '23 at 02:11

BlueSnake

67
1
5

Questions tagged [positional-encoding]