I love this question because I grew up with Creatures. I'm not sure if you've played Creatures but from my memory the creatures were relatively simplistic AI. I would not have asked a Norm to play pong much less anything requiring higher level strategy. For anything more complex than reacting to the environment and not algorithm based (see The Sims for algorithmic AI in the pre-2010s) the methods/compute just weren't there.
There are some big names (Sony notably) working towards RL in modern video games so we are getting to the age of it but we still aren't there in a few ways.
But hey, I'm not here to crush dreams so let's make it realistic:
- Ensure there is no simpler method for how you want the NPCs to react to the player. It's fun to jump into DeepRL but, for example, if the desired interaction is "Oh the player goes B every 3rd round, the AI should learn to go B in those cases" it's much easier to maintain a table of probabilities and choose a predefined strategy based on it.
- Make the inputs/outputs as simple as possible. The landmark DQN paper down sampled the Atari(!) screen. We've gotten to where Youtubers train Mario kart AI on down sampled N64 screens but the point is we still struggle with really rich inputs. Keeping the input space as small as possible is your friend for training time and to keep the AI from overfitting.
- Learn to transfer train. Players probably wont be happy with NPCs who do little to nothing for the first 12 hours of gameplay. Establish a reasonable model at engineering time and allow a subset of the model to adjust at playtime from interaction with the player. The reality is training even a few layers of a NN won't happen in real time but maybe between rounds or as a background task.