Questions tagged [decoder]

10 questions
3
votes
1 answer

Aren't context lengths for transformers an artificial restriction?

Let's focus on the case of decoder-only transformers, where I am using algorithm 10 from "Formal Algorithms for Transformers" by Mary Phung and Marcus Hutter as a reference. : https://i.sstatic.net/ZWC9o.png Previously I thought that the maximum…
2
votes
1 answer

What does "use log probability to automatically increase the temperature until certain thresholds are hit" mean with OpenAI ASR with temperature=0

I read on https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-temperature (mirror): temperature. number. Optional. Defaults to 0. The sampling temperature, between 0 and 1. Higher values like 0.8 will…
Franck Dernoncourt
  • 3,473
  • 2
  • 21
  • 39
2
votes
0 answers

Causal attention with left padding

I am trying to train a decoder-only transformer model. The dataset is left-padded to a fixed length so sequences of tokens can be batched. However, when I try to pass input through a multi head attention layer, with both a key padding mask and…
xnsc
  • 21
  • 1
1
vote
1 answer

Is an autoencoder model encoder-only or encoder-decoder?

I'm writing up about different model architectures used in NLP, namely encoder-only models, encoder-decoder-only models, and have come across what seems to be a naming inconsistency. For decoder-only models it seems that they can be referred to as…
1
vote
1 answer

Masking in Decoder of Transformer

I understand that the masked multi-head attention block ensures that generation of token at time step t doesn't rely on subsequent tokens of the input. But the residual connection which adds the input to the output of masked multi-head attention…
1
vote
1 answer

Transformer decoder. Causal masking during inference?

I understand how causal masking in the self-attention layer of the decoder works and why we use it during training. What I want to ask is: should we use causal masking during inference ? Consider a machine translation task where you need to…
pi-tau
  • 995
  • 6
  • 12
0
votes
1 answer

Autoregressive Models(LLM) inference Prediction

So while predicting the next word in autoregressive models(LLM) will the attention mechanism use queries from starting word or only previous word. Like for predicting after sentence "I love" attention mechanism takes query value for I and love and…
0
votes
0 answers

Grayscale to RGB888 vs RGB332 to RGB888 in same colorization training between two universes

Suppose there are two parallel universes that train deep learning models for color resolution. The first universe uses grayscaled image as input that has dimension (640,480,1), the second universe uses RGB332 image as input that has same dimension…
0
votes
0 answers

Low resolution color channel into high resolution color channel

There is super resolution enhancement, but it's about image dimension resolution, such as 128×128×3 image can be enhanced into 2048×2048×3 HD image, where the color information is still 24-bit. But, is there a model that can decode low resolution…
-1
votes
1 answer

Does a decoder in transformer model generate output embeddings like the following?

Encoder: Input: [A, B, C, D] (word embeddings) Output: [C1, C2, C3, C4] (contextual representations) The encoder processes the input sequence [A, B, C, D] and generates contextual representations [C1, C2, C3, C4]. The specific calculations involved…
Steven
  • 99
  • 1