What is the difference betwen fine runing and rlhf for llm?

Question

I am confused about the difference betwen fine runing and rlhf for llm. When to use what? I know RLHF need to creating a reward model which at furst rates responses to align the responses to the human preferences and afterward using this reward model to fine-tune.

But if thats the case, when fine-tuning is relevant anymore. When to use what?

score 2 · Answer 1 · answered Jul 28 '23 at 23:39

RLHF is just one possibility of fine-tuning for generative LLMs, which is used to align an LLM to human tastes.

However, you could just create a bunch of great data, and fine-tune (take a pretrained models and just slightly adjust it) on this high quality data, without any RLHF involved (see for example what phi-1 has done)

score 0 · Answer 2 · answered Sep 22 '23 at 18:27

So finetuning (on a supervised dataset) is a) extremely expensive (in terms of data collection and annotation, as well as GPU costs). RLHF gets around this by training a reward model as well as a policy model. The reward model provides a signal to the generated outputs, and the policy model helps guide the LLM towards correct/aligned decoding during generation time.

Sources:

What is the difference betwen fine runing and rlhf for llm?

2 Answers2