Highest Voted 'instruct-gpt' Questions - Artificial Intelligence Stack Exchange

5

votes

2 answers

InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?

InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used? $$ \operatorname{loss}(\theta) = -\frac{1}{\binom{K}{2}}E_{(x,y_w,y_l)\sim D}[\log(\sigma(r_{\theta}(x, y_w) - r_{\theta}(x, y_l)))] $$ The equation was taken…

asked Jan 17 '23 at 11:49

Nathan G

161
3

4

votes

1 answer

What's the difference between GPT3.5 and InstructGPT?

I read about the different model series in GPT3.5 here - https://platform.openai.com/docs/models/gpt-3-5 At the beginning of the page, it mentions to look at https://platform.openai.com/docs/model-index-for-researchers to understand the difference…

comparison open-ai gpt gpt-3 instruct-gpt

asked Apr 06 '23 at 08:56

Arya

41
2

1

vote

1 answer

Repainting a picture in the style of some painter (or of another picture)

It sounds like a straight-forward task for DALL-E (and GPT?) to present a painting and ask to repaint it "in the style of Leonardo da Vinci". Like one can present texts and ask to rewrite them in the style of some author. Or even better: to present…

chatgpt gpt-3 instruct-gpt

asked Mar 20 '23 at 16:47

Hans-Peter Stricker

931
1
8
23

1

vote

0 answers

What does "shuffle the comparisons into one dataset" mean?

I couldn't understand the wording here. What does "shuffle the comparisons into one dataset" mean? How does the method they use don't have $K \choose 2$ forward passes for K completions? Do they update $K \choose 2$ in an epoch for K completions or…

reward-functions instruct-gpt

asked Feb 13 '23 at 07:44

ali batur karakullukcu

11
2

0

votes

1 answer

InstructGPT: In the objective RL function, does $y$ mean full response, or only a single token?

In the paper, They write: Now, is $y$ the full response or only the next token repsonse? One the one hand, the reward model expects full response, on the other hand they write 'per-token KL penalty'. So we sample the next token, or we sample the…

reinforcement-learning proximal-policy-optimization instruct-gpt

asked Mar 13 '24 at 11:28

Nathan G

161
3

0

votes

1 answer

How is InstructGPT a fine-tuned version of GPT-3 and at the same time has fewer parameters than the original GPT3?

I am reading the paper "Training language models to follow instructions with human feedback" It says: Our labelers provide demonstrations of the desired behavior on the input prompt distribution (see Section 3.2 for details on this distribution).…

weights gpt-3 instruct-gpt

asked Feb 25 '23 at 11:12

DanielTheRocketMan

113
4

Questions tagged [instruct-gpt]

InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?

What's the difference between GPT3.5 and InstructGPT?

Repainting a picture in the style of some painter (or of another picture)

What does "shuffle the comparisons into one dataset" mean?

InstructGPT: In the objective RL function, does $y$ mean full response, or only a single token?

How is InstructGPT a fine-tuned version of GPT-3 and at the same time has fewer parameters than the original GPT3?