Questions tagged [gpt]

For questions related to GPT (which stands for Generative Pre-Training), which is a combination of transformers (proposed in "Attention is All You Need") and unsupervised pre-training for solving language tasks, such as machine translation. GPT was proposed in "Improving Language Understanding by Generative Pre-Training" (2018) by Open AI. There's also GPT-2, which was proposed in "Language Models are Unsupervised Multitask Learners" (2019) by Open AI.

91 questions
38
votes
1 answer

What is the "temperature" in the GPT models?

What does the temperature parameter mean when talking about the GPT models? I know that a higher temperature value means more randomness, but I want to know how randomness is introduced. Does temperature mean we add noise to the weights/activations…
Tom Dörr
  • 503
  • 1
  • 4
  • 7
34
votes
1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?
30
votes
4 answers

Why is ChatGPT bad at math?

As opposed to How does ChatGPT know math?, I've been seeing some things floating around the Twitterverse about how ChatGPT can actually be very bad at math. For instance, I asked it "If it takes 5 machines 5 minutes to make 5 devices, how long would…
Mithical
  • 2,965
  • 5
  • 28
  • 39
27
votes
1 answer

What exactly are the "parameters" in GPT-3's 175 billion parameters and how are they chosen/generated?

When I studied neural networks, parameters were learning rate, batch size etc. But even GPT3's ArXiv paper does not mention anything about what exactly the parameters are, but gives a small hint that they might just be sentences. Even tutorial…
Nav
  • 491
  • 1
  • 5
  • 10
23
votes
2 answers

Why does GPT-2 Exclude the Transformer Encoder?

After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens. Why does GPT-2 not…
8
votes
2 answers

Is GPT-4 based on GPT-3 or was it trained from the scratch?

To me it looks like GPT-4 is based on GPT-3. On the other hand, there were rumors that training of GPT-3 was done with errors, but re-train was impossible due to the costs.
Anixx
  • 361
  • 1
  • 11
8
votes
2 answers

What is the difference between the positional encoding techniques of the Transformer and GPT?

I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques. More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much. How…
Leevo
  • 305
  • 2
  • 9
7
votes
5 answers

How is GPT 4 able to solve math?

How can GPT 4 solve complex calculus and other math problems. I believe these problems require analytical reasoning and ability to compute numbers. Does it still use a LLM to complete this process or does it add on to this? Here is the link to the…
desert_ranger
  • 672
  • 1
  • 6
  • 21
7
votes
3 answers

Is the Mask Needed for Masked Self-Attention During Inference with GPT-2

My understanding is that masked self-attention is necessary during training of GPT-2, as otherwise it would be able to directly see the correct next output at each iteration. My question is whether the attention mask is necessary, or even possible,…
7
votes
1 answer

How do we know if GPT-2 is a better language model?

You may have heard of GPT2, a new language model. It has recently attracted attention from the general public as the foundation that published the paper, OpenAI, ironically refused to share the whole model fearing dangerous implications. Along the…
Lucas Morin
  • 262
  • 2
  • 13
6
votes
2 answers

How does GPT-based language model like ChatGPT determine the n-th letter of a word?

I understand that GPT models process input text by converting words into tokens and then embedding vectors and do not process them letter by letter. Given this approach, I am curious to know how a model like ChatGPT can identify the first (or n-th)…
6
votes
2 answers

How is the next token predicted in transformers?

In the transformer (or GPT/decoder only), at the end of the decoder blocks but before the final linear layer you have X vectors (for the X tokens at the input of the decoder). We then want to compute the probabilities for the next token of the…
5
votes
2 answers

Where can I find pre-trained language models in English and German?

Where can I find (more) pre-trained language models? I am especially interested in neural network-based models for English and German. I am aware only of Language Model on One Billion Word Benchmark and TF-LM: TensorFlow-based Language Modeling…
5
votes
1 answer

What can GPT-4 do linguistics-wise?

I have no access to GPT-4, but I wonder whether it can do the following (where ChatGPT failed). Make syntactic and morphological analysis of sentences in a language like Russian, marking cases, parts of speech and sentence, conjugations of verbs,…
Anixx
  • 361
  • 1
  • 11
4
votes
2 answers

What sort of computer would be necessary to run queries on a LLM?

I've heard that to train a model like GPT 4.0 you need a very powerful computer and ~$10M of computing power, but once you've produced the trained ~570GB model, what sort of computing power is necessary to execute specific queries with it?
ak0000
  • 205
  • 3
  • 9
1
2 3 4 5 6 7