I looked for examples online and found one that had alpha zero are there any other algorithms that i can apply on this game? Chain reaction game- We have grid and each player can place their pieces on any place that is empty and occupied by the same player. If it reaches critical mass it explodes to the cardinal directions available to it(Eg.-For top corner critical mass is 2 and it has 2 directions available east and south).It will replace the pieces there with it's own piece and increment the original number of pieces there by 1.The end game is when only one player is left on the board.
1 Answers
The wonderous results in games such as Go are due to the zero sum setting, in addition to it being simpler it allows autocurriculum learning and makes it easier to do credit assignment.
There are some compelling MARL algorithms for fully cooperative scenarios, but they are enabled by the simplification of all Agents sharing the exact same goals.
Your case is a mixed reward setting, where the Agent will benefit from actions such as short-term alliances, betrayals, etc. This will also result in a lot of non-stationarity during training because it's difficult to objectively measure good and bad outcomes, a bad player can do things we may consider "dumb" and yet ruin the play of a "smart" player.
Your practical options are:
A: turn into 1v1 and train with more conventional RL approaches.
B: have the other opponents hard-coded
Otherwise, you are endeavoring to solve a really challenging problem.
- 16
- 1