Highest Voted 'seq2seq' Questions - Artificial Intelligence Stack Exchange

19

votes

4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…

asked Oct 29 '19 at 05:25

user8714896

825
1
9
24

6

votes

2 answers

What are the differences between seq2seq and encoder-decoder architectures?

I've read many tutorials online that use both words interchangeably. When I search and find that they are the same, why not just use one word since they have the same definition?

machine-learning comparison definitions seq2seq encoder-decoder

asked Dec 07 '23 at 08:15

user78615

6

votes

1 answer

What's the difference between content-based attention and dot-product attention?

I'm following this blog post which enumerates the various types of attention. It mentions content-based attention where the alignment scoring function for the $j$th encoder hidden state with respect to the $i$th context vector is the cosine…

neural-networks attention seq2seq

asked Dec 30 '20 at 10:04

Alexander Soare

1,379
3
12
28

4

votes

1 answer

Can Reinforcement Learning be used to generate sequences?

Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?

reinforcement-learning reference-request applications sequence-modeling seq2seq

asked May 26 '21 at 17:03

penguin_smasher

41
3

3

votes

0 answers

Any models for text to json

There are many sequence to sequence (seq2seq) models and end to end models, like text to sql. I was wondering are there any text to json deep learning models? For example: Text "Switch on the computer". JSON: {"actions":["switch on"],…

deep-learning natural-language-processing seq2seq

asked Oct 10 '22 at 11:16

tired and bored dev

131
4

3

votes

1 answer

Is seq2seq the best model when input/output sequences have fixed length?

I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?

time-series sequence-modeling seq2seq

asked Aug 09 '21 at 11:23

Petrus

31
1

3

votes

0 answers

What is the difference between zero-padding and character-padding in Recurrent Neural Networks?

For RNN's to work efficiently, we vectorize the operations, which results in an input matrix of shape (m, max_seq_len) where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples…

natural-language-processing recurrent-neural-networks seq2seq padding

asked Feb 27 '21 at 11:04

PhysicsMan

31
1

2

votes

0 answers

Are there any successful applications of transformers of small size (<10k weights)?

In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are the first choices in this sort of…

applications transformer sequence-modeling efficiency seq2seq

asked May 16 '21 at 19:35

spiridon_the_sun_rotator

2,852
12
17

2

votes

1 answer

How is Google Translate able to convert texts of different lengths?

According to my experience with Tensorflow and many other frameworks, neural networks have to have a fixed shape for any output, but how does Google translate convert texts of different lengths?

natural-language-processing recurrent-neural-networks machine-translation google-translate seq2seq

asked Nov 10 '20 at 05:38

Gavin

49
3

2

votes

0 answers

What is the time complexity of the forward pass and back-propagation of the sequence-to-sequence model with and without attention?

I keep looking through the literature, but can't seem to find any information regarding the time complexity of the forward pass and back-propagation of the sequence-to-sequence RNN encoder-decoder model, with and without attention. The paper…

recurrent-neural-networks transformer time-complexity forward-pass seq2seq

asked Apr 09 '20 at 21:33

user1234544

53
6

1

vote

0 answers

How to Interpret Cross Attention

I am a bit confused on what cross attention mechanisms are doing. I understand that the currently decoded output is usually the query and the conditioning/input (from an encoder) is the key and value. The query is multiplied by the key to make an…

transformer attention seq2seq encoder-decoder

asked Apr 25 '24 at 20:02

Kiran Manicka

11
1

1

vote

0 answers

Modifying Cross Entropy Loss to work with multiple correct target sequences?

Let's say I'm training a transformer model to perform a seq to seq task, but there are multiple correct answers. For example, the following outputs would all be considered correct: source: A B C -> target: C B D source: A B C -> target: C D E B…

transformer machine-translation seq2seq

asked Nov 13 '23 at 21:09

Brayden Alexander Rudisill

11
1

1

vote

0 answers

The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model: This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal. But when I…

long-short-term-memory sequence-modeling machine-translation seq2seq gated-recurrent-unit

asked Apr 23 '23 at 14:27

Đạt Trần

11
2

1

vote

1 answer

Why is it called a Seq2Seq model if the output is just a number?

Why is it called a Seq2Seq model if the output is just a number? For example, if you are trying to predict a movie's recommendation, and you are inputting a sequence of users and their ratings, shouldn't it be a Seq2Number model since you're only…

transformer seq2seq

asked Feb 24 '23 at 05:33

user65577

1

vote

1 answer

Is the decoder in a transformer Seq2Seq model non parallelizable?

From my understanding, seq2seq models work by first computing a representation of the input sequence, and feeding this to the decoder. The decoder then predicts each token in the output sequence in an autoregressive manner. In this sense, it's…

deep-learning transformer seq2seq

asked Oct 12 '22 at 18:01

Andrew Tang

31
3

Questions tagged [seq2seq]