30

As opposed to How does ChatGPT know math?, I've been seeing some things floating around the Twitterverse about how ChatGPT can actually be very bad at math. For instance, I asked it "If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?"

The conversation went like this:

Mithical: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?
ChatGPT: If it takes 5 machines 5 minutes to make 5 devices, then it would take 100 machines 100 minutes to make 100 devices.
Mithical: That's not right. Check carefully and try again.
ChatGPT: I apologize if my response was not correct. Can you please provide more information or clarify your question? I may have misunderstood the scenario you described.
Mithical: It takes a machine 5 minutes to make a device. How long would it take 100 machines to make 100 devices?
ChatGPT: If it takes a single machine 5 minutes to make a single device, then it would take 100 machines 5 minutes to make 100 devices. Therefore, it would take a total of 500 minutes for 100 machines to make 100 devices.

Now, unless I'm misunderstanding something here, the answer is 5 minutes. ChatGPT first makes the intuitive mistake of 100, that a human might make as well, and then goes on to (correctly, as far as I understand) say it's 5 minutes... but concludes in the same response that it's then 500 minutes.

Why is this AI so bad at math? Computers are generally supposed to be good at math. Why does this model make such simple logical mistakes?

nbro
  • 42,615
  • 12
  • 119
  • 217
Mithical
  • 2,965
  • 5
  • 28
  • 39

4 Answers4

27

chatGPT is able to create well-formed sentences which contain phrases that are fitting for the input. It has rules extracted from its data, but those are not rules of understanding, but rules of 'likely phrases'.

In the most simple variant, every smartphone already has that mechanism. You type on your screen and the phone gives you three words that you would statistically use most often after the previous one. On the phone that is not of a very high quality. Choose the most likely words for a message and the sentence that results is something like (just started and chose a stream of 'middle options'):

"I am going on a trip with my friends to get to the full house for the weekend so we will need a time for a quick dinner with us at all and we can go over and there for the weekend if we want you guys out there"

You can see how many words have a bit of a statistical likelihood relation backwards time for, for a, time -> quick, quick dinner, with us, with -> at all.

chatGPT is fundamentally the same but much better than what the phone can do. Its statistics are much 'deeper'. It is the phone's autocompletion on steroids. But 'well-formed' and 'fitting' do not equate 'reliability' nor 'trustworthiness'. More data and training will not help, it will become even more 'well-formed' and 'fitting', but not more reliable.

'Well-formed' and 'fitting' do suggest to humans reliability and intelligence strongly, but that is more because we humans judge quickly and superficially. It's not so much that chatGPT is intelligent, but more that we are less so that we think.

What is noteworthy is that many humans make the same 5/5/5 100/100/100 error. The difference is that you can explain to humans where their quick-and-dirty answer goes wrong. That is because they do have that understanding that chatGPT has not.

gctwnl
  • 371
  • 2
  • 3
11

(Check out my heavily related answer to a similar question here)

Why is ChatGPT bad at math, while it is very good at other stuff?

The problem comes down to the age-old problem of learning vs understanding. On a high level, your question is very philosophical. The problem is that the model learns everything present in the data. However, throw it a slightly different mathematical problem it has not seen before, and it fails to answer spectacularly easy questions.

One of the best examples are papers that try to make mathematics AIs. They throw it an endless amount of data for simple addition, but throw it a slightly higher number that it has never seen before, and it cannot solve it.

These AIs learn their data well, but they cannot extrapolate strict 'rules' from the data. Addition is a simple 'rule', but it cannot comprehend this rule. Neural networks cannot learn rules simply because they are not designed to do so. They are multiplying and adding up inputs, weights and biases, and they are not designed to learn 'if, else' logic.

An additional train of thought which i very much like is the following: Think of how precise a super small multi-layer perception would have to be to implement simple 2 digit addition (given 4 inputs). Should be doable right! Now think of how you would have to learn the weights of a model with 1.000.000 parameters to do the exact same, using a large dataset and a NN optimizer. It would just start learning how to approximate the answers, but it will never converge to a state where it learns the rule of addition.

Robin van Hoorn
  • 2,780
  • 2
  • 12
  • 33
4

ChatGPT is good at math and can understand the logic. It can derive new conclusions on its own and generate value which was not there before. To make use of ChatGPT you have to provide it with the right context and questions. This applies to math but also many other areas.

I started with the same task, 5 machines/5 minutes/5 devices. At first, it was able to provide correct answer. Then I changed '5' to '3' (3 machines/3 minutes/3 devices) and it failed. But then I provided it with the context, asked a question differently, and it was able to understand the problem, and even create a general rule for solving it.

Here is how (link to the chat)...

Initial question, answered correctly. Most likely, in the meantime, this was added to its training data. The original question was posted like a month ago.

B: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: If it takes 5 machines 5 minutes to make 5 devices, it would take 1 machine 5 minutes to make 1 device.

If it takes 1 machine 5 minutes to make 1 device, it would take 100 machines 5 minutes to make 100 devices.

Then I changed the question in a trivial way (3 instead of 5). At first it failed to get it:

B: If it takes 3 machines 3 minutes to make 3 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: It is not possible to determine how long it would take 100 machines to make 100 devices based on the information provided. The relationship between the number of machines and the time it takes to produce a certain number of devices is not necessarily linear. In order to determine how long it would take 100 machines to make 100 devices, we would need to know the efficiency of the machines and how they work together.

But then:

B: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: If it takes 5 machines 5 minutes to make 5 devices, it would take 1 machine 5 minutes to make 1 device.

If it takes 1 machine 5 minutes to make 1 device, it would take 100 machines 5 minutes to make 100 devices.

B: using the same logic as in the previous answer please analyze similar problem and provide the answer.

If it takes 3 machines 3 minutes to make 3 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: If it takes 3 machines 3 minutes to make 3 devices, it would take 1 machine 3 minutes to make 1 device.

If it takes 1 machine 3 minutes to make 1 device, it would take 100 machines 3 minutes to make 100 devices.

Then I went further and was able to get more from ChatGPT, it was able to provide new value in the form of general rule for the class of problems. Many humans would cut short on this step ;D ...

B: using the same logic, please formulate general rule to solve this type of a problem

ChatGPT: Here is a general rule that can be used to solve this type of problem:

Let X be the number of machines, Y be the number of minutes it takes for X machines to make X devices, and Z be the number of devices that need to be made.

To determine the number of minutes it would take Y machines to make Z devices, divide Z by X and multiply the result by Y.

For example, if X = 5, Y = 5, and Z = 100, the number of minutes it would take 5 machines to make 100 devices can be calculated as follows: (100/5) * 5 = 20 minutes.

This rule can be expressed in the following formula:

(Z/X) * Y = number of minutes it would take Y machines to make Z devices

I hope this helps! Let me know if you have any questions or if you would like further clarification on this rule.

Its final formula is correct, "(Z/X) * Y = number of minutes it would take Y machines to make Z devices".

But no one is perfect, it has 2 small mistakes there.

1st, "To determine the number of minutes it would take Y machines to make Z devices, divide Z by X and multiply the result by Y." It should have used another letter, like W, to avoid confusion.

2nd, example " (100/5) * 5 = 20 minutes." is just wrong, result is 100 minutes.

Nevertheless, those small mistakes dont make any difference. It can now solve advance varaint of the question (chellange any human with this :D )

B: If it takes 3 machines 3 minutes to make 3 devices, how long would it take 7 machines to make 110 devices?

ChatGPT: If it takes 3 machines 3 minutes to make 3 devices, it would take 1 machine 3 minutes to make 1 device.

If it takes 1 machine 3 minutes to make 1 device, it would take 7 machines 3 minutes to make 7 devices.

To determine the number of minutes it would take 7 machines to make 110 devices, we can use the formula (Z/X) * Y = number of minutes it would take Y machines to make Z devices, where X is the number of machines, Y is the number of minutes it takes for X machines to make X devices, and Z is the number of devices that need to be made. Plugging in the values, we get:

(110/7) * 3 = 45 minutes

Therefore, it would take 7 machines 45 minutes to make 110 devices using the production system described.

Again, not perfect, it rounded up 110/7 to 15. Should keep it at 15.7 exactly. But ok, still makes sense. You can give it further instructions, e.g. to not round calculations for this type of problem and it will comply going forward.

My take on this: we should all learn how to interact with this thing, its gonna be very important in the near future!

Boris
  • 49
  • 2
3

ChatGPT's GPT-4 model does not fall for this trap anymore due to more extensive training.

I tested with two prompts:

Prompt: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 50 machines to make 50 devices?

GPT-4: This is an example of a classic problem that's meant to test understanding of rates. The key to solving it is recognizing that each machine can produce one device in five minutes.

Given that, if you have 50 machines working simultaneously, they can produce 50 devices in 5 minutes. This is because each machine is working independently to produce its own device, so the overall rate of production increases linearly with the number of machines. Thus, the answer is 5 minutes.

Prompt: And if it takes 3 machines 3 minutes to make 3 devices, how long would it take 70 machines to make 70 devices?

GPT-4: The principle is the same in this case. Each machine can produce one device in three minutes. So, if you have 70 machines working simultaneously, they can produce 70 devices in 3 minutes. This is because each machine is working independently to produce its own device, so the overall rate of production increases linearly with the number of machines. Thus, the answer is 3 minutes.

As I explained at the question How does ChatGPT know math?, the apparent ability of LLM models (in particular the most capable GPT-4 model) to solve certain mathematical problems is due to the amount of training and the amount of parameters. These ML models do not have explicit rules for solving mathematical problems.

The following 2022 paper describes that such capabilities of transformer-based language models occur (i.e. emerge) when a certain threshold of parameter and training (FLOPs) quantity is exceeded: https://arxiv.org/pdf/2206.07682.pdf

LeRobert
  • 41
  • 1