3

I see that discounted reward reinforcement learning has been extensively studied in the literature. However, the average reward metric receives less attention, and it looks like algorithms for this metric (R-learning, H-learning, SMART, etc.) are less than the discount metric. Could you suggest any algorithms for average reward reinforcement learning in continuous or general state-action space?

k2pctdn
  • 157
  • 5

0 Answers0