2

I'd like to ask if it is, generally, better to model a problem that naturally appears as a Contextual Multi-Armed Bandit like Recommender Systems as a Markov Decision Process with a non-zero discount factor (otherwise it's just an MDP with one step episodes) or is it better to treat it as it is; a Contextual Multi-Armed Bandit (MDP with a zero discount factor)

I'm thinking about some problems like Recommender Systems where we can't define well the dynamics of the environment and so using a non-zero discount factor wouldn't make much sense since we'll take into account the recommendations for users that are independent of each other.

nbro
  • 42,615
  • 12
  • 119
  • 217
Daviiid
  • 585
  • 5
  • 17

0 Answers0