Questions tagged [gpt-2]

For questions related to the GPT-2 (which stands for "Generative Pre-trained Transformer 2") language model, which is described in the paper "Language Models are Unsupervised Multitask Learners" (2019) by OpenAI. GPT-2 is a successor of GPT and a predecessor of GPT-3.

Check more in Wikipedia

8 questions
10
votes
1 answer

How do I use GPT-2 to summarise text?

In section 3.6 of the OpenAI GPT-2 paper it mentions summarising text based relates to this, but the method is described in very high-level terms: To induce summarization behavior we add the text TL;DR: after the article and generate 100 tokens…
Tom Hale
  • 384
  • 3
  • 13
2
votes
2 answers

Why do we need masking and context window during inference in LLMs?

For a more concrete discussion, if we focus on the GPT-2 model, which is an auto-regressive model, I fully understand why we need masking for training. However, I need clarification on why we need to mask for inference. During inference, the model…
F Gh
  • 21
  • 1
2
votes
0 answers

How to generate longer text with GPT-2?

I am currently using Huggingface transformers to generate text using GPT-2. The problem is, that it only generates 1024 tokens, what seems to be a hard limit in the code of the script, which enforces a maximum length of the model generation length…
allo
  • 312
  • 1
  • 9
1
vote
1 answer

Strange Periodic Train Accuracy During Toy LLM Pretraining

dear community, I am trying to reproduce the result of Allen-Zhu's Physics of Language Model paper 3.1 (https://arxiv.org/abs/2309.14316). This paper is mainly about training a toy GPT-2 model on synthetic data of individual biographies and…
1
vote
0 answers

Question about "unsupervised learning objective" in the GPT-2 paper

I find it difficult to understand the following from the GPT-2 paper. Language modeling is also able to, in principle, learn the tasks of McCann et al. (2018) without the need for explicit supervision of which symbols are the outputs to be…
Tom Bennett
  • 111
  • 4
0
votes
1 answer

How to generate a sentence containing a specific set of tokens using GPT2 or BERT?

I have different sets of words as inputs, e.g., {governor, John Gary Evans, office, 1894} or {cheetah, 80km/h, mammal} I would like to construct a grammatically correct sentence that contains a full set, or a subset of these tokens. So the outputs…
Vladimir
  • 51
  • 2
0
votes
1 answer

BERT2: How to use GPT2LMHeadModel to start a sentence, not complete it

I am using GPT2LMHeadModel to change the way GPT2 choose the next word in a sentence. At this point, I have to give the initial part of the sentence and GTP2 starts to predict the better next word. I want GPT2 to read an entire sentence and then…
0
votes
1 answer

GPT-2: (Hardware) requirements for fine-tuning the 774M model

I wonder if there's anyone who has actually succeeded in fine-tuning GPT-2's 774M model without using cloud TPU's. My GeForce RTX 2070 SUPER couldn't handle it in previous attempts. I'm running TensorFlow 1.14.0 with CUDA V 9.1 on Ubuntu 18.04. For…