0

For a transformer decoder, how exactly are K, Q, and V for each decoding step?

Assume my input prompt is "today is a" (good day).

At t= 0 (generation step 0): K, Q, and V are the projections of the sequence ("today is a") Then say the next token generated is "good".

At t=1 (generation step 1): Which one is true:

  1. K, Q, and V are the projections of the sequence ("today is a good")
  2. K, Q, are the projections of the sequence ("today is a"), and V is the projection of the sequence ("good")?
nbro
  • 42,615
  • 12
  • 119
  • 217
wrek
  • 183
  • 4

1 Answers1

2

(This type of) autoregressive LLM always works by predicting one next token based on a series of previous tokens. First you run the model with input "today is a" and the prediction is "good". Then you run the model with input "today is a good" and the prediction is "day", and so on. Each token is predicted by running the entire model from start to finish on its previous input.