Highest Voted 'decoder' Questions - Artificial Intelligence Stack Exchange

3

votes

1 answer

Aren't context lengths for transformers an artificial restriction?

Let's focus on the case of decoder-only transformers, where I am using algorithm 10 from "Formal Algorithms for Transformers" by Mary Phung and Marcus Hutter as a reference. : https://i.sstatic.net/ZWC9o.png Previously I thought that the maximum…

asked Oct 04 '23 at 07:45

Robert Wegner

133
5

2

votes

1 answer

What does "use log probability to automatically increase the temperature until certain thresholds are hit" mean with OpenAI ASR with temperature=0

I read on https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-temperature (mirror): temperature. number. Optional. Defaults to 0. The sampling temperature, between 0 and 1. Higher values like 0.8 will…

open-ai speech-recognition decoder

asked Nov 09 '24 at 19:38

Franck Dernoncourt

3,473
2
21
39

2

votes

0 answers

Causal attention with left padding

I am trying to train a decoder-only transformer model. The dataset is left-padded to a fixed length so sequences of tokens can be batched. However, when I try to pass input through a multi head attention layer, with both a key padding mask and…

transformer padding decoder

asked Aug 22 '24 at 04:44

xnsc

21
1

1

vote

1 answer

Is an autoencoder model encoder-only or encoder-decoder?

I'm writing up about different model architectures used in NLP, namely encoder-only models, encoder-decoder-only models, and have come across what seems to be a naming inconsistency. For decoder-only models it seems that they can be referred to as…

natural-language-processing autoencoders language-model encoder-decoder decoder

asked Oct 03 '24 at 14:06

KurtMica

111
3

1

vote

1 answer

Masking in Decoder of Transformer

I understand that the masked multi-head attention block ensures that generation of token at time step t doesn't rely on subsequent tokens of the input. But the residual connection which adds the input to the output of masked multi-head attention…

natural-language-processing transformer decoder

asked Nov 12 '23 at 12:36

SAGALPREET SINGH

147
1
10

1

vote

1 answer

Transformer decoder. Causal masking during inference?

I understand how causal masking in the self-attention layer of the decoder works and why we use it during training. What I want to ask is: should we use causal masking during inference ? Consider a machine translation task where you need to…

transformer decoder

asked Sep 18 '23 at 08:58

pi-tau

995
6
12

0

votes

1 answer

Autoregressive Models(LLM) inference Prediction

So while predicting the next word in autoregressive models(LLM) will the attention mechanism use queries from starting word or only previous word. Like for predicting after sentence "I love" attention mechanism takes query value for I and love and…

natural-language-processing transformer large-language-models decoder

asked Nov 03 '24 at 09:54

adithya

11
2

0

votes

0 answers

Grayscale to RGB888 vs RGB332 to RGB888 in same colorization training between two universes

Suppose there are two parallel universes that train deep learning models for color resolution. The first universe uses grayscaled image as input that has dimension (640,480,1), the second universe uses RGB332 image as input that has same dimension…

computer-vision performance metric stochastic-policy decoder

asked Oct 29 '24 at 09:30

Muhammad Ikhwan Perwira

800
3
10

0

votes

0 answers

Low resolution color channel into high resolution color channel

There is super resolution enhancement, but it's about image dimension resolution, such as 128×128×3 image can be enhanced into 2048×2048×3 HD image, where the color information is still 24-bit. But, is there a model that can decode low resolution…

computer-vision decoder

asked Oct 29 '24 at 08:21

Muhammad Ikhwan Perwira

800
3
10

-1

votes

1 answer

Does a decoder in transformer model generate output embeddings like the following?

Encoder: Input: [A, B, C, D] (word embeddings) Output: [C1, C2, C3, C4] (contextual representations) The encoder processes the input sequence [A, B, C, D] and generates contextual representations [C1, C2, C3, C4]. The specific calculations involved…

transformer decoder

asked Dec 04 '23 at 10:10

Steven

99
1

Questions tagged [decoder]