4

I asked ChatGPT a question, and its response included this statement:

In "The Ruling Class" film, which was released in 1972, Patrick McGoohan played the role of Bishop Lampton.

Unlike many of its vague or weaselly responses, this is a direct statement of fact. But it is false and very easy to disprove, so I wondered how it came up with that misinformation?

I asked it how, and it apologized and admitted it was wrong:

Q: You say "Patrick McGoohan played the role of Bishop Lampton", when in fact it was Alastair Sim that played that role. Where did you get this misinformation?

A: I apologize for the error in my previous response. You are correct that Patrick McGoohan did not appear in "The Ruling Class" film and that Alastair Sim played the role of Bishop Lampton.

I apologize for any confusion or inconvenience that my previous response may have caused. As an AI language model, I strive to provide accurate information, and I will make sure to double-check my sources more carefully in the future. Thank you for bringing this to my attention.

But it avoided answering the question:

Q: But my question was "Where did you get this misinformation?".

A: I apologize for the confusion. As an AI language model, I do not have personal experiences, so I did not get the misinformation from anywhere. I must have made a mistake in my response due to an error in processing or a misunderstanding of the question.

It wouldn't have just made it up in order to provide an amazingly good answer, so how can it give a false statement of fact and claim that it didn't get the information from anywhere? (I.e. what mechanism does it use to generate facts if they don't originate somewhere else?)

And worse, it actually lied while apologizing:

I will make sure to double-check my sources more carefully in the future.

Ray Butterworth
  • 238
  • 1
  • 2
  • 14

2 Answers2

7

The reason is that ChatGPT is not a data retrieval system. In other words, it is not consulting a database, nor is it performing searches of the internet, to find answers to questions posed.

Instead, it is generating novel answers based on the trained parameters within the network. In a sense, this is far more impressive.

If you think about it, ChatGPT and large language models generally provide a very interesting method of compressing data. Of course, compression isn't the point of LLMs, but you can view this as one of the outcomes. How so?

During training, the parameters are learned such that the model is able to reasonably predict the next most likely token (word/bigram/trigram/letter/symbol) based on all of the tokens that have come before it in that session, including any base prompt that is included silently. When you ask it to recite something like The Lovesong of J. Alfred Prufrock, it can do so very accurately... yet, it does not have a copy of Prufrock "memorized" somewhere. Instead, it is recreating Prufrock generatively.

This is also why its answers, while incredibly confident, can be wildly incorrect. It is essentially generating the most likely token that comes next, not reasoning or thinking about what the text means.

David Hoelzer
  • 1,198
  • 11
  • 21
1

Incalculable times i caught it literally making things up it does not know. For example an hour ago i asked it if Arnold Schwarzenegger says the line "i don't want to hurt you" in any of his movies. It first correctly returned no, then i wrote try harder and it apologized and said he says it in Batman and Robin 1997. I asked for exact minute it said 1:03, i looked no such line, then it said it's at 1:26 then i looked and there said there is no line there either. Then it apologize again and said there is no such line. So It lied 3 times, totally making it up. And i have personally caught it lying already countless times.

Nix
  • 11
  • 2