Could Softmax Action Selection be useful to solve an episodic task with more than 100000 possible states and 2000 actions?

Question

I am new in the field of RL. I am trying to use tabular methods, Q-Learning for solving a problem that takes a lot of time for computation, so I would like to know if there are more efficient methods for it.

Why are tabular methods are not useful in large state spaces? Maybe too many possible combinations? Could Softmax Action Selection be better than epsilon greedy?

score 1 · Accepted Answer · answered May 18 '22 at 15:54

Your question contains the answer. Use value function approximation. Tabular methods must compute a value for each state. That becomes unfeasible with large state spaces. Function approximators can genererlize, and perform well even without ever having seen every state.

Could Softmax Action Selection be useful to solve an episodic task with more than 100000 possible states and 2000 actions?

1 Answers1