I am currently trying to build a DQN agent that plays the game UNO The observation it gets looks like this:
{
'agent_cards': BoundedArraySpec(shape=(108,2), dtype=np.int32, minimum=[0,0] , maximum=[5,15], name='agent_cards'),
'top_discard_card': (
BoundedArraySpec(shape=(2,), dtype=np.int32, minimum=0, maximum=5, name='top_discard_card_color'),
BoundedArraySpec(shape=(2,), dtype=np.int32, minimum=0, maximum=15, name='top_discard_card_type')
),
'enemy_card_amounts': BoundedArraySpec(shape=(3,), dtype=np.int32, minimum=0, maximum=108, name='enemy_card_amounts'),
'turn_order_reversed': BoundedArraySpec(shape=(1,), dtype=np.int32, minimum=0, maximum=1, name='turn_order_reversed')
}
and my action spec looks like this
BoundedArraySpec(shape=(), dtype=np.int32, minimum=0, maximum=107, name='card_input')
The idea is to give the agent a list of up to 108 cards (the max hand limit) and specify their color: (0-5 for none, red green blue yellow and wild) as well as their type: (0 for none, 1-10 for 0-9, 11-15 for draw 2, reverse, skip, wild and draw 4)
The agent also receives a tuple of the data for the card on top of the discard pile, as well as the number of cards the other 3 players have, and whether the turn order has been reversed.
The current action space has the agent select a value from 0 to 107 corresponding to a card in it's hand. This is the card it will attempt to play onto the discard pile.
My main question is how should I encode the data of the cards to the agent. Is my current method a good fit for machine learning, or is there another encoding scheme that would fit my agent better?
In case it matters my preprocessing layers are:
preprocessing_layers = {
'agent_cards': keras.Sequential([
keras.layers.Reshape((216,)),
TypeCastLayer(dtype="float32")
]),
'top_discard_card': keras.Sequential([
keras.layers.Concatenate(axis=-1, name="con2"),
TypeCastLayer()
]),
'enemy_card_amounts': keras.Sequential([TypeCastLayer()]),
'turn_order_reversed': keras.Sequential([TypeCastLayer()])
}
(note that TypeCastLayer simply converts int32 to float32)
And my preprocessing combiner is keras.layers.Concatenate(axis=-1, name="con3")
A link to the colab can be found here: https://colab.research.google.com/drive/1qZuIXChYpNJ4a9nKZDNIL_tdU2hF359N#scrollTo=V5AbtOYLakmk
(It currently gives an error about 'NoneType' object has no attribute 'is_first' But that is because I have not yet programmed in the actual _step method yet.)