5

Fundamentally, a game-playing AI must solve the problem of choosing the best action from a set of possible actions.

Most existing game AI's, such as AlphaGo, do this by using an evaluation function, which maps game states to real numbers. The real number typically can be interpreted as a monotonic function of a winning probability estimate. The best action is the one whose resultant state yields the highest evaluation.

Clearly, this approach can work well. But it violates one of Vladimir Vapnik's imperatives (in his book "Estimation of Dependences Based on Empirical Data"): "When solving a problem of interest, do not solve a more general problem as an intermediate step." In fact, he specifically states as an illustration of this imperative,

Do not estimate predictive values if your goal is to act well. (A good strategy of action does not necessarily rely on good predictive ability.)

Indeed, human chess and go experts appear to heed his advice, as they are able to act well without using evaluation functions.

My question is this: has there has been any recent research aiming to solve games by learning to compare decisions directly, without an intermediate evaluation function?

To use AlphaGo as an example, this might mean training a neural network to take two (similar) board states as input and output a choice of which one is better (a classification problem), as opposed to a neural network that takes one board state as input and outputs a winning probability (a regression problem).

nbro
  • 42,615
  • 12
  • 119
  • 217
dshin
  • 161
  • 5

1 Answers1

3

Human chess and go experts clearly use evaluation functions. They do come up with moves that look sensible without evaluating the board position, but to validate these candidate moves they evaluate board positions that occur at the end of the variations they calculate. Pretty similar to AlphaGo.

Inputting two board states and outputting a preference is a (much) more complex task than mapping one board state into the real numbers. And it gives you less information. So its a lose-lose choice. (I did try something very similar and it didn't work at all. The reason is that you didn't just double the input size, rather you made the input space quadratically bigger.)

If you compare two board states that differ just by one move, then your input space doesn't quite explode as much, but you have to do a ton of comparisons to make a decision. The logical choice would be to output a preference distribution over all possible moves - but that's exactly what AlphaGo does in its policy network. There is also an earlier paper which trained a network to predict expert moves, which comes down to the same thing. And yes, both these networks play quite strongly without any search or board evaluation. But nowhere near AlphaGo-level.

BlindKungFuMaster
  • 4,265
  • 13
  • 23