3

(The math problem here just serves as an example, my question is on this type of problems in general).

Given two Schur polynomials, $s_\mu$, $s_\nu$, we know that we can decompose their product into a linear combination of other Schur polynomials.

$$s_\mu s_\nu = \sum_\lambda c_{\mu,\nu}^\lambda s_\lambda$$

and we call $c_{\mu,\nu}^\lambda$ the LR coefficient (always an non-negative integer).

Hence, a natural supervised learning problem is to predict whether the LR coefficient is of a certain value or not given the tuple $<\mu, \nu, \lambda>$. This is not difficult.

My question is: can we either use ML/RL to do anything else other than predicting (in this situation) or extract anything from the prediction result? In other words, a statement like "oh, I am 98% confident that this LR coefficient is 0" does not imply anything mathematically interesting?

nbro
  • 42,615
  • 12
  • 119
  • 217
SmoothKen
  • 153
  • 3

1 Answers1

3

There are quite a few examples of papers where they try and 'teach' neural networks to 'learn' how to solve math problems. Most of the time, sadly, it comes down to training on a large dataset after which the network can 'solve' the sort of basic problems, but is unable to generalize this to larger problems. That is, if you train a neural network to solve addition, it will be inherently constrained by the dataset. It might be able to semi-sufficiently solve addition with 3 or even 4 digits, depending on how big your dataset is, but throw in an addition question containing 2 10 digit numbers, and it will almost always fail.

The latest example that I can remember where they tried this is in the General Language Model GPT-3, which was not made to solve equations per se, but does 'a decent job' on the stuff that was in the dataset. Facebook AI made an 'advanced math solver' with a specific architecture that i have not looked into which might disproof my point, but you can look into that.

In the end, this comes down to 'what is learning' and 'what do you want to accomplish'. Most agree that these network are not able to generalize beyond their datasets. Some might say that not being able to generalize does not mean that it is not learning. It might just be learning slower. I believe that these models are inherently limited to what is presented in the dataset. Given a good dataset, it might be able to generalize to cases 'near and in-between', but I have yet to see a case where this sort of stuff generalizes to cases 'far outside' the dataset.

Robin van Hoorn
  • 2,780
  • 2
  • 12
  • 33