4

Are there any algorithms to use reinforcement learning to learn optimal policies in partially observable Markov decision process (POMDP) i.e. when the state is not perfectly observed? More specifically, how does one update the belief state using Bayes' rule when the update Q kernel is not known?

nbro
  • 42,615
  • 12
  • 119
  • 217

0 Answers0