Highest Voted 'encoder-decoder' Questions - Artificial Intelligence Stack Exchange

19

votes

4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…

asked Oct 29 '19 at 05:25

user8714896

825
1
9
24

6

votes

2 answers

What are the differences between seq2seq and encoder-decoder architectures?

I've read many tutorials online that use both words interchangeably. When I search and find that they are the same, why not just use one word since they have the same definition?

machine-learning comparison definitions seq2seq encoder-decoder

asked Dec 07 '23 at 08:15

user78615

6

votes

1 answer

Which situation will helpful using encoder or decoder or both in transformer model?

I have some questions about using (encoder / decoder / encoder-decoder) transformer models, included (language) transformer or Vision transformer. The overall form of a transformer consists of an encoder and a decoder. Depending on the model, you…

transformer encoder-decoder vision-transformer

asked Jul 27 '23 at 14:41

Yang

79
1
6

4

votes

1 answer

Why do we need both encoder and decoder in sequence to sequence prediction?

Why do we need both encoder and decoder in sequence to sequence prediction? We could just have a single RNN that, given input $x$, outputs some value $y(t)$ and hidden state $h(t)$. Next, given $h(t)$ and $y(t)$, the next output $y(t+1)$ and hidden…

machine-learning ai-design sequence-modeling encoder-decoder

asked Dec 03 '18 at 16:48

greensquare

61
3

3

votes

0 answers

What is input (and shape) to K/V/Q of self-attention of EACH Decoder block of Language-translation model Transformer's tokens during Inference?

Transformer model of the original Attention paper has a decoder unit that works differently during Inference than Tranining. I'm trying to understand the shapes used during decoder (both self-attention and enc-dec-attention blocks), but it's very…

transformer attention machine-translation language-model encoder-decoder

asked Nov 09 '21 at 15:48

Joe Black

181
2
6

2

votes

1 answer

Difference of encoder-decoder to decoder-only transformers w.r.t. loss

What is the difference between an encoder-decoder transformer and decoder-only transformer with regard to the loss calculation. Specifically, how does the loss signal differ? And how does this relate to token efficiency? As far as I understand, an…

transformer generative-model gpt encoder-decoder

asked Nov 13 '24 at 14:35

Green 绿色

148
5

2

votes

1 answer

Why can decoder-only transformers be so good at machine translation?

In my understanding encoder-decoder transformers for translation are trained with sentence or text pairs. How can it be explained in simple (high-level) terms that decoder-only transformers (e.g. GPT) are so good at machine translation, even though…

transformer machine-translation encoder-decoder

asked Jun 07 '23 at 18:46

Hans-Peter Stricker

931
1
8
23

2

votes

0 answers

Combining GANs and NLP for AI-Based Programming: Generating Input-Output Templates for Computer Functions

I would like to combine GANs and NLP to create a system that can take an input and generate an appropriate output. For example, given the input 9 to the power of 2, the system would output pow(9,2). I am not entirely sure how to research this, but I…

tensorflow python chatgpt encoder-decoder

asked Mar 18 '23 at 20:17

Doğuş Deniz

21
3

1

vote

1 answer

Should encoder only or decoder only transformers be used for name origin identification?

Like, identifying what mythology is a god from by providing it's name.

transformer encoder-decoder

asked Apr 04 '25 at 19:11

user1678860

377
8

1

vote

1 answer

Why does the DETR model for object detection have 17.8M parameters in transformer blocks?

The DETR paper specifies that there are 17.8M parameters in transformer blocks, given a hidden dimension of 256, 6 encoder blocks, and 6 decoder blocks. But if I use the approximate formula 28 x N x d^2 as derived in this post for an encoder-decoder…

transformer object-detection encoder-decoder

asked Jan 11 '25 at 21:14

Fijoy Vadakkumpadan

161
4

1

vote

1 answer

What's make transformer encoder difference from its decoder part?

Encoders are commonly used for understanding sequences (like classification, NER, etc.), while decoders are used for generative tasks in an autoregressive manner (e.g., GPT-style text generation). Decoders are autoregressive because they generate…

transformer encoder-decoder

asked Nov 30 '24 at 15:05

Muhammad Ikhwan Perwira

800
3
10

1

vote

1 answer

Is an autoencoder model encoder-only or encoder-decoder?

I'm writing up about different model architectures used in NLP, namely encoder-only models, encoder-decoder-only models, and have come across what seems to be a naming inconsistency. For decoder-only models it seems that they can be referred to as…

natural-language-processing autoencoders language-model encoder-decoder decoder

asked Oct 03 '24 at 14:06

KurtMica

111
3

1

vote

0 answers

What’s more efficient in multihead attention: multiply QKV by $W_i$ then split or linearly project QKV $h$ times into dimensions $d_k$?

I’m looking to bridge two implementations of multihead attention. Approach 1: Multiply and Split Each of the queries, keys, and values is multiplied by a separate square weight matrix of size (embedding size, embedding size). The transformed…

machine-learning deep-learning transformer pytorch encoder-decoder

asked May 25 '24 at 16:45

marcocamilo

11
2

1

vote

0 answers

How to Interpret Cross Attention

I am a bit confused on what cross attention mechanisms are doing. I understand that the currently decoded output is usually the query and the conditioning/input (from an encoder) is the key and value. The query is multiplied by the key to make an…

transformer attention seq2seq encoder-decoder

asked Apr 25 '24 at 20:02

Kiran Manicka

11
1

1

vote

1 answer

Why do Transformer decoders use masked self attention when producing new tokens?

I've been reading that transformer decoders use masked self attention so that the decoder can't cheat by looking ahead. For example, when predicting the 6th token in the sequence we shouldn't have access to the 7th token. However, why can't the…

natural-language-processing transformer attention gpt encoder-decoder

asked Feb 26 '24 at 23:51

Kiran Manicka

113
6

Questions tagged [encoder-decoder]