Questions tagged [instruct-gpt]
6 questions
5
votes
2 answers
InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?
InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?
$$ \operatorname{loss}(\theta) = -\frac{1}{\binom{K}{2}}E_{(x,y_w,y_l)\sim D}[\log(\sigma(r_{\theta}(x, y_w) - r_{\theta}(x, y_l)))] $$
The equation was taken…
Nathan G
- 161
- 3
4
votes
1 answer
What's the difference between GPT3.5 and InstructGPT?
I read about the different model series in GPT3.5 here - https://platform.openai.com/docs/models/gpt-3-5
At the beginning of the page, it mentions to look at https://platform.openai.com/docs/model-index-for-researchers to understand the difference…
Arya
- 41
- 2
1
vote
1 answer
Repainting a picture in the style of some painter (or of another picture)
It sounds like a straight-forward task for DALL-E (and GPT?) to present a painting and ask to repaint it "in the style of Leonardo da Vinci". Like one can present texts and ask to rewrite them in the style of some author. Or even better: to present…
Hans-Peter Stricker
- 931
- 1
- 8
- 23
1
vote
0 answers
What does "shuffle the comparisons into one dataset" mean?
I couldn't understand the wording here.
What does "shuffle the comparisons into one dataset" mean?
How does the method they use don't have $K \choose 2$ forward passes for K completions? Do they update $K \choose 2$ in an epoch for K completions or…
0
votes
1 answer
InstructGPT: In the objective RL function, does $y$ mean full response, or only a single token?
In the paper,
They write:
Now, is $y$ the full response or only the next token repsonse? One the one hand, the reward model expects full response, on the other hand they write 'per-token KL penalty'. So we sample the next token, or we sample the…
Nathan G
- 161
- 3
0
votes
1 answer
How is InstructGPT a fine-tuned version of GPT-3 and at the same time has fewer parameters than the original GPT3?
I am reading the paper "Training language models to follow instructions with human feedback"
It says:
Our labelers provide demonstrations of the desired behavior on the input prompt distribution (see Section 3.2 for details on this distribution).…
DanielTheRocketMan
- 113
- 4