3

I am confused about the difference betwen fine runing and rlhf for llm. When to use what? I know RLHF need to creating a reward model which at furst rates responses to align the responses to the human preferences and afterward using this reward model to fine-tune.

But if thats the case, when fine-tuning is relevant anymore. When to use what?

Exploring
  • 371
  • 7
  • 18

2 Answers2

2

RLHF is just one possibility of fine-tuning for generative LLMs, which is used to align an LLM to human tastes.

However, you could just create a bunch of great data, and fine-tune (take a pretrained models and just slightly adjust it) on this high quality data, without any RLHF involved (see for example what phi-1 has done)

Alberto
  • 2,863
  • 5
  • 12
0

So finetuning (on a supervised dataset) is a) extremely expensive (in terms of data collection and annotation, as well as GPU costs). RLHF gets around this by training a reward model as well as a policy model. The reward model provides a signal to the generated outputs, and the policy model helps guide the LLM towards correct/aligned decoding during generation time.

Sources:

  1. https://www.assemblyai.com/blog/how-rlhf-preference-model-tuning-works-and-how-things-may-go-wrong/
  2. https://arxiv.org/pdf/2204.05862.pdf