Questions tagged [seq2seq]

For questions related to sequence-to-sequence (seq2seq) machine learning models/architectures, used e.g. in machine translation.

34 questions
19
votes
4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…
6
votes
2 answers

What are the differences between seq2seq and encoder-decoder architectures?

I've read many tutorials online that use both words interchangeably. When I search and find that they are the same, why not just use one word since they have the same definition?
user78615
6
votes
1 answer

What's the difference between content-based attention and dot-product attention?

I'm following this blog post which enumerates the various types of attention. It mentions content-based attention where the alignment scoring function for the $j$th encoder hidden state with respect to the $i$th context vector is the cosine…
Alexander Soare
  • 1,379
  • 3
  • 12
  • 28
4
votes
1 answer

Can Reinforcement Learning be used to generate sequences?

Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?
3
votes
0 answers

Any models for text to json

There are many sequence to sequence (seq2seq) models and end to end models, like text to sql. I was wondering are there any text to json deep learning models? For example: Text "Switch on the computer". JSON: {"actions":["switch on"],…
3
votes
1 answer

Is seq2seq the best model when input/output sequences have fixed length?

I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?
Petrus
  • 31
  • 1
3
votes
0 answers

What is the difference between zero-padding and character-padding in Recurrent Neural Networks?

For RNN's to work efficiently, we vectorize the operations, which results in an input matrix of shape (m, max_seq_len) where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples…
2
votes
0 answers

Are there any successful applications of transformers of small size (<10k weights)?

In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are the first choices in this sort of…
2
votes
1 answer

How is Google Translate able to convert texts of different lengths?

According to my experience with Tensorflow and many other frameworks, neural networks have to have a fixed shape for any output, but how does Google translate convert texts of different lengths?
2
votes
0 answers

What is the time complexity of the forward pass and back-propagation of the sequence-to-sequence model with and without attention?

I keep looking through the literature, but can't seem to find any information regarding the time complexity of the forward pass and back-propagation of the sequence-to-sequence RNN encoder-decoder model, with and without attention. The paper…
1
vote
0 answers

How to Interpret Cross Attention

I am a bit confused on what cross attention mechanisms are doing. I understand that the currently decoded output is usually the query and the conditioning/input (from an encoder) is the key and value. The query is multiplied by the key to make an…
1
vote
0 answers

Modifying Cross Entropy Loss to work with multiple correct target sequences?

Let's say I'm training a transformer model to perform a seq to seq task, but there are multiple correct answers. For example, the following outputs would all be considered correct: source: A B C -> target: C B D source: A B C -> target: C D E B…
1
vote
0 answers

The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model: This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal. But when I…
1
vote
1 answer

Why is it called a Seq2Seq model if the output is just a number?

Why is it called a Seq2Seq model if the output is just a number? For example, if you are trying to predict a movie's recommendation, and you are inputting a sequence of users and their ratings, shouldn't it be a Seq2Number model since you're only…
user65577
1
vote
1 answer

Is the decoder in a transformer Seq2Seq model non parallelizable?

From my understanding, seq2seq models work by first computing a representation of the input sequence, and feeding this to the decoder. The decoder then predicts each token in the output sequence in an autoregressive manner. In this sense, it's…
1
2 3