1

What exactly is the purpose, since they can be prompted without fine tuning, although they are significantly worse than when fine-tuned?

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

0

Fine-tuning embeds task-specific knowledge directly into the LLM model weights, whereas prompting relies on the model's existing weights learned during its pretraining. A typical pretrained foundation LLM under causal or masked language modeling to learn contextualized embedding and prediction isn't specialized for any downstream NLP tasks. Cases such as high-stakes outcomes with specialized tasks (medical classification, Q&A, Chat), domain-specific tasks (legal contract generation, research paper summarization), or improved efficiency in specific workflows are better handled further by fine-tuning. If pretraining a foundation model is like teaching a student general knowledge by letting them read thousands of books and learn patterns in language, then fine-tuning is like training the same student specifically to pass an exam using past questions and answers, and possible continuous training of various specific tasks in the future.

A common example is the necessary multi-stage fine-tuning process of ChatGPT designed to align its foundation GPT model's capabilities with human preferences, safety standards, and Q&A task-specific conversational performance, in a continuous evolutionary fashion.

ChatGPT is built on OpenAI's proprietary series of generative pre-trained transformer (GPT) models and is fine-tuned for conversational applications using a combination of supervised learning and reinforcement learning from human feedback... Both approaches employed human trainers to improve model performance. In the case of supervised learning, the trainers played both sides: the user and the AI assistant. In the reinforcement learning stage, human trainers first ranked responses that the model had created in a previous conversation. These rankings were used to create "reward models" that were used to fine-tune the model further by using several iterations of proximal policy optimization... OpenAI collects data from ChatGPT users to train and fine-tune the service further. Users can upvote or downvote responses they receive from ChatGPT and fill in a text field with additional feedback.

cinch
  • 11,000
  • 3
  • 8
  • 17