3

Suppose $x_{t+1} \sim \mathbb{P}(\cdot | x_t, a_t)$ denotes the state transition dynamics in a reinforcement learning (RL) problem. Let $y_{t+1} = \mathbb{P}(\cdot | x_{t+1})$ denote the noisy observation or the imperfect state information. Let $H_{t}$ denote the history of actions and observations $H_{t+1} = \{b_0,y_0,a_0,\cdots,y_{t+1}\}$.

For the RL Partially Observed Markov Decision Process (RL-POMDP), the summary of the history is contained in the "belief state" $b_{t+1}(i) = \mathbb{P}(x_{t+1} = i | H_{t+1})$, which is the posterior distribution over the states conditioned on the history.

Now, suppose the model is NOT known. Clearly, the belief state can't be computed.

Can we use a Gaussian Process (GP) to approximate the belief distribution $b_{t}$ at every instant $t$?

Can Variational GP be adapted to such a situation? Can universal approximation property of GP be invoked here?

Are there such results in the literature?

Any references and insights into this problem would be much appreciated.

nbro
  • 42,615
  • 12
  • 119
  • 217
math_phile
  • 56
  • 2

0 Answers0