Could AlphaZero be trained to prefer "beautiful" games?

Question

Could a version of AlphaZero be trained that learned not only how to win, but how to win in a "beautiful" way?

Jurgen Schmidhuber wrote a paper in 2008, which basically models "beauty" as the level of compressibility of the data, relative to the observer. Too much, and the data is boring; too little and the data appears too random and so is also boring. Somewhere in between, means there is some interesting structure that we partially understand, but not fully, and that is what appeals to our sense of curiosity

Could a model be trained in the same way as AlphaZero, but with an additional component that measures the beauty/interestingness of its play. As it learns to play better, it develops a better understanding of the game, and in parallel it learns to compress the board state and history at each move, and then somehow uses that to reward more beautiful play?

I think this is a logical direction to go in now. Maybe an approach like this could help ATPs and AI for maths research, since beauty is a key quality of good/interesting mathematics.

We know AI can thrash the best humans at Go, so its no longer interesting. We want to see more crazy stuff like move 37. Maybe we can compare it to watching a great football match. We love the 1970s Brazil team, because of their outrageous style and flair not because they won every single match.

score 1 · Answer 1 · answered Mar 28 '24 at 23:46

Could a model be trained in the same way as AlphaZero, but with an additional component that measures the beauty/interestingness of its play

Yes, as long as I can give you a game and you return me a scalar value representing how much "beauty" correspond to that game (you are free to calculate it however you want, even with a personal criterion)

Could AlphaZero be trained to prefer "beautiful" games?

1 Answers1