Is it better to model a Contextual Multi-Armed Bandit problem as an MDP with a non-zero discount factor than treating it as it is?

Asked Jul 29 '21 at 20:07

Active Dec 21 '21 at 10:07

Viewed 178 times

I'd like to ask if it is, generally, better to model a problem that naturally appears as a Contextual Multi-Armed Bandit like Recommender Systems as a Markov Decision Process with a non-zero discount factor (otherwise it's just an MDP with one step episodes) or is it better to treat it as it is; a Contextual Multi-Armed Bandit (MDP with a zero discount factor)

I'm thinking about some problems like Recommender Systems where we can't define well the dynamics of the environment and so using a non-zero discount factor wouldn't make much sense since we'll take into account the recommendations for users that are independent of each other.

edited Dec 21 '21 at 10:07

nbro

42,615
12
119
217

asked Jul 29 '21 at 20:07

Daviiid

Is it better to model a Contextual Multi-Armed Bandit problem as an MDP with a non-zero discount factor than treating it as it is?

0 Answers0