Why GPT uses decoder only architecture, when they can use full encoder decoder architecture?

Asked Oct 08 '24 at 18:09

Active Oct 08 '24 at 18:09

Viewed 488 times

I wonder why does GPTs use decoder only architecture, instead of full Encoder Decoder architecture. In full encoder-decoder transformer architecture, we convert the input sequence to a contextual embeddings once, and then output is generated in autoregressive manner, but in decoder only case input is given at each step of predicting next word. So, using full architecture seems less time taking and efficient. Also, in decoder only architecture we cannot get the bidirectional context of input, which may be useful for contextual understanding.

asked Oct 08 '24 at 18:09

Parag Londhe

Why GPT uses decoder only architecture, when they can use full encoder decoder architecture?

0 Answers0