How do the sine and cosine functions encode position in the transformer?

Asked Mar 03 '19 at 19:41

Active Nov 30 '21 at 15:41

Viewed 278 times

After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence.

Is it the fact that since each row (input token) in a matrix (entire input sequence) has a unique waveform as its encoding, each of which can be expressed as a linear function of any other element in the input sequence, then the transformer can learn relations between these rows via linear functions?

edited Nov 30 '21 at 15:41

nbro

42,615
12
119
217

asked Mar 03 '19 at 19:41

shoshi

How do the sine and cosine functions encode position in the transformer?

0 Answers0

Linked