I see that discounted reward reinforcement learning has been extensively studied in the literature. However, the average reward metric receives less attention, and it looks like algorithms for this metric (R-learning, H-learning, SMART, etc.) are less than the discount metric. Could you suggest any algorithms for average reward reinforcement learning in continuous or general state-action space?
Asked
Active
Viewed 44 times