5

I want to create a simple game which basically consists of 2d circles shooting smaller circles at each other (to make hitbox detection easier for the start). My goal is to create an ai which adapts its own behaviour to the player‘s. For that, i want to use a NN as brain. Every frame, the NN is fed with the same inputs as the player and his output is compared to the players output. (outputs in this case are pressed keys like the up-arrow) As inputs, I want to use a couple different important factors: for example, the direction of the enemy player as number from 0 to 1

I also want to input the direction, size and speed of enemy’s and own projectiles and this is where my problem lies. If there was only one bullet per player, it would be easy but I want the number of bullets to be variable so the number of input neurons would have to be variable.

My approaches: 1) use a big amount of neurons and set unused ones to 0 ( not elegant at all) 2) Instead of specific values, just use all the pixels‘ rgb values as inputs (would limit the game as colours would deliver all the information) (+factors like speed and direction would probably not have any impact)

Is there a more promising approach to this problem ? I hope you can give me some inspiration.

Also, is there a difference in ranging input values between 0/1 or -1/1 ?

Thank you in advance, Mo

Edit: In case there aren‘t enough questions for you, is there a way to make the NN remember things ? For example, if I added a mechanic to the game which involves holding a key, I would add an input neuron which inputs 1 if the certain key is pressed and 0 if it isn‘t but I doubt that would work.

Cr3ative
  • 53
  • 4

2 Answers2

2

The most generic approach is to input all pixels as you have suggested. A CNN would be the best architecture for that. To provide information like speed or velocity, you can feed more than one frame to the CNN (e.g. the last 5 frames or whatever provides enough information). The CNN can learn movement information by comparing those images.

If you want to store additional information (like an inventory item), an input neuron for each value would be an option. You can also look up LSTM (long-short term memory) models, but for your specific situation a hardcoded neuron would be the easier solution.

Demento
  • 1,684
  • 1
  • 9
  • 26
2

I recommend preprocessing images and feeding pixel values of several combined images. Some ideas:

  1. Preprocess all images to grayscale if possible. It’ll reduce the number of input neurons. (As long as this step doesn’t introduce large overhead)

  2. Select some $\gamma$ value such that 0 < $\gamma$ < 1. Generate (ie. Select from your game) $n$ sequential images. For the $k$th image in the sequence, multiply every pixel value by $\gamma^{n-k-1}$. This assumes we index $k$ starting at zero.

  3. Sum the pixel values of all processed images with clip ~ [0, 255] (for a valid range of values)

This will yield a single image where stationary pixels will be summed to create brighter / more saturated spots, where moving objects will have “shadows” or “tails” which are faded with each time step ($\gamma$ is the “fading factor” so to speak).

Image input: As long as all values are on a comparable scale, it shouldn’t really matter whether inputs are on range [-1, 1] or [0, 1], but since you’ll be using pixel values, they will all be positive. So normalizing the pixel values will yield a range [0, 1].

Note: this kind of processing can probably be done iteratively with greater efficiency by summing, then multiplying by gamma at each time step. Then you can implement it online.

Now consider what you want the OUTPUT of the network to be. If you want the agent to take an action after processing the inputs, your output later should consist of one neuron per discrete action (ie, each “button” that can be pushed). I will limit my answer to discrete actions since that is likely the most useful answer for this question.

Finally, you asked about if the network can “remember things,” like “holding down a key.” This question is a bit vague, but let me try to answer. It sounds like you were considering using this as an INPUT to the network. In theory, you could use a similar implementation (ie. At every time step measure if the button was pressed. Perhaps use 1 if pressed 0 otherwise. Decay at every time step and sum. With n time steps, the sum will have a max value of $\sum_{n}(\gamma^{n-1})$). Remember to decay by $n-k-1$, with $k$ starting at zero. You don’t have to actually decay this value, but decaying by a factor of gamma helps the network know if for instance the button was pressed near the first frame or the last frame.

With that said, I don’t know if you want to use this as an input. If the AI is meant to have more information than the opponent, than I suppose you can. But then the agent will not be learning from the same information as the opponent. Also, if holding the button produces a clearly visible effect, that information would be captured in the images already, so might be a redundant input.

These ideas are not the only implementation, but can get you going. It sounds like you’ll need a measure of reward and likely need to structure this as an RL problem. The details of that are beyond the scope of this post and I don’t want to get too afield of your original question. Just note that comparing to the players output may not give you the results you want, and even if it did, your network will be limited to learning only to mimic the other player. Using a measure of reward will allow your agent to theoretically advance beyond the skill of its opponent by taking actions that maximize reward, even if the opponent would not have thought to take that action.

I hope this helps.

Hanzy
  • 519
  • 3
  • 11