4

Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is received only when a particular temporal logic formula is satisfied (LTL or CTL formula). However, I cannot find any work about learning which rewards correspond to which temporally extended behavior.

In my searches, I came across k-order MDPs (which are non-Markovian). I did not find RL research done on k-order MDPs.

nbro
  • 42,615
  • 12
  • 119
  • 217
Gavin Rens
  • 41
  • 3

0 Answers0