Autoregressive Models(LLM) inference Prediction

Question

So while predicting the next word in autoregressive models(LLM) will the attention mechanism use queries from starting word or only previous word. Like for predicting after sentence "I love" attention mechanism takes query value for I and love and after predicting lets say as pizza, the next word attention mechanism does it take query value for I ,love ,pizza or only pizza.

cinch · Accepted Answer · 2024-11-04T06:53:04.673

During inference, autoregressive models predicts text one token at a time sequentially. At each prediction step, attention mechanism takes query from only the current token and scores with all previous tokens' keys to compute a (softmax) weighted sum of the values associated with each of these previous tokens as output of the attention layer, and thus with runtime complexity of $O(T^2)$ for each layer where $T$ is the length of all the generated tokens in a sequence.

The attention mechanism along with multiple attention heads are designed to use the entire sequence history to capture multiple aspects of dependencies, which enables it to understand and generate contextually relevant responses at both training and inference stages.

Autoregressive Models(LLM) inference Prediction

1 Answers1