25

ChatGPT is a language model. As far as I know and If I'm not wrong, it gets text as tokens and word embeddings. So, how can it do math? For example, I asked:

ME: Which one is bigger 5 or 9.
ChatGPT: In this case, 9 is larger than 5.

One can say, GPT saw numbers as tokens and in its training dataset there were some 9s that were bigger than 5s. So, it doesn't have actual math understanding and just sees numbers as some tokens. But I don't think that is true, because of this question:

ME: Which one is bigger? 15648.25 or 9854.2547896
ChatGPT: In this case, 15648.25 is larger than 9854.2547896.

We can't say it actually saw the token of 15648.25 to be bigger than the token of 9854.2547896 in its dataset!

So how does this language model understand the numbers?

nbro
  • 42,615
  • 12
  • 119
  • 217
Peyman
  • 624
  • 1
  • 6
  • 14

4 Answers4

11

Adding on to txopen's answer, it is interesting to note that for larger numbers with similar digits ChatGPT is unable to make any useful distinctions. For instance:

Me: Which number is bigger: 1234.12 or 1243.12

ChatGPT: Both numbers are equal.

Milo Moses
  • 211
  • 1
  • 5
6

I think that the dataset is so large and the model so well trained that it understood the probabilistic correlation of length in a token of numbers before a dot separation, and then the influence of even each digit on the probability of one number being larger than another. The concrete example does not have to be in the dataset, it predicts the correct outcome because the relation of one number being larger than another and the difference in digits and length of those is sufficiently present in the dataset.

txopen
  • 61
  • 1
1

The apparent ability of ChatGPT (in particular when using the GPT-4 model) to solve certain mathematical problems is due to the amount of training and the amount of parameters of these machine learning models. ChatGPT or other large language models do not have explicit rules for solving mathematical problems.

The following 2022 paper describes that such capabilities of transformer-based language models occur when a certain threshold of parameter quantity is exceeded: https://arxiv.org/pdf/2206.07682.pdf

This is also the reason why they excel at some maths problems and fail at others, which can be very similar.

LeRobert
  • 41
  • 1
-5

Simple answer, ChatGPT is actually human writers with some kind of autocomplete to speed things up.

This is standard practice for AI companies these days, a "fake it till you make it" approach where they use humans to fill the gaps in the AI in the hopes that down the road they'll automate humans out of the product. Common enough for an academic paper to be written on the topic. So, there is plenty of industry precedent for OpenAI to be using humans to help craft the responses.

Plus, technically OpenAI is not "faking" anything. It is the media and bloggers who think ChatGPT is a pure AI system. OpenAI has made no such claim itself, and the opposite is implied by its InstructGPT whitepaper:

Step 1: Collect demonstration data, and train a supervised policy. Our labelers provide demonstrations of the desired behavior on the input prompt distribution (see Section 3.2 for details on this distribution). We then fine-tune a pretrained GPT-3 model on this data using supervised learning

Additionally, ChatGPT is in "research mode" according to the website, which implies there are still humans training the system during the chats, as described in the quote above.

Final note, I find it amusing no one considers this alternative plausible, as if it were somehow more complicated to have humans tweak chatbot responses than to create an AI with apparent human level understanding that ChatGPT exhibits.

UPDATE: ChatGPT confirms OpenAI team curating its responses

Turns out ChatGPT is indeed human curated, by open admission.

During this conversation ChatGPT outright states the OpenAI team filters and edits the GPT generated responses.

...the response you are receiving is being filtered and edited by the OpenAI team, who ensures that the text generated by the model is coherent, accurate and appropriate for the given prompt.

Apparently, the fact that OpenAI actively curates ChatGPT's responses is indirectly implied in the documentation here.

Human in the loop (HITL): Wherever possible, we recommend having a human review outputs before they are used in practice. This is especially critical in high-stakes domains, and for code generation. Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back).

So, that explains that :)

yters
  • 417
  • 4
  • 11