I was looking for a way to apply reinforcement learning to a discrete state and continuous action space, specifically algorithms and common methods of approaching this type of problem.
I have tried applying a version of DynaQ to two environments: Pendulum v1 and Swimmer v2, where I discretise the state space into bins between the lowest and highest values for pendulum v1 and seemingly achieve similar results for optimisation to the expected. This confirms my method works to a degree but I wanted to ask if there were any known algorithms/methods which are typical for this type of problem (discrete obs/cont acts) as I would like to ideally try multiple methods and compare?
(As an optional aside: I have attempted to do this for Swimmer v2, where the lower and upper bounds of the state space are +/- inf. I have chosen somewhat arbitrary boundaries of +/- ~1000 and maybe 7/8 bins as there are 8 components to the states so end up with ~1e7 states and achieve a reward of ~30 but according to this I should expected closer to 100, if anyone can offer any solutions?)