I know that large language models like GPT-3 are trained simply to continue pieces of text that have been scraped from the web. But how was ChatGPT trained, which, while also having a good understanding of language, is not directly a language model, but a chatbot? Do we know anything about that? I presume that a lot of conversations was needed in order to train it. Did they simply scrape those conversations from the web, and where did they find such conversations in that case?
Asked
Active
Viewed 2,329 times
1 Answers
6
The key ingredient is called Reinforcement Learning from Human Feedback (RLHF), that is having humans rate the model answers and use the feedback to guide the model training.
The official blog explains this fairly well.
Rexcirus
- 1,309
- 9
- 22
