For simplicity, let’s focus on knowledge reasoning tasks with Yes/No answers. According to learning theory, even moderately complex knowledge reasoning tasks are PAC-unlearnable. This implies that no learning-based reasoning engine trained on a finite sample set can achieve an accuracy strictly greater than 50%. In contrast, a trivial algorithm that determines answers by flipping a coin would also achieve 50% accuracy.
Therefore, I am puzzled as to why there are so many LLM reasoning models nowadays, attempting to achieve high-quality knowledge reasoning by chasing benchmark scores. According to the aforementioned theory, such models do not seem to offer a fundamental advantage over the trivial coin-flipping algorithm.
Some people might argue that the theory of PAC-learnability requires reasoning tasks to be stably learnable under any distribution, while in real-world scenarios, the probability distribution of knowledge reasoning problems is specific. Therefore, the PAC-unlearnability limitations may not apply to LLM reasoning models.
However, it is evident that we do not actually know this specific distribution in real-world knowledge reasoning. In particular, since human needs are constantly evolving, it remains uncertain whether such a specific probability distribution even exists. Moreover, according to existing research, learning probability distributions itself is also unlearnable.
Is there a misunderstanding in my reasoning? What is your perspective on this issue?