2

I am stuck at the proof of the contraction of variance for distributional Bellman operator from the paper, in which it is defined as enter image description here

and the proof is stated as

enter image description here

In its second part, how is the variance of the target distribution equal to the expectation of the variance terms over the next state-action pairs?

Magi Feeney
  • 51
  • 1
  • 5

2 Answers2

1

The confusing part is how the variance is defined in terms of $P^{\pi}Z(x, a)$, which includes three sources of randomness, $X', A', Z$. However, if we define the variance as

\begin{align*} \mathbb{V}(P^{\pi}Z(x, a)) & \triangleq \int P(X' = x', A' = a') \mathbb{V} (Z | x', a')dx'da' \\ & = \mathbb{E}_{X', A'}[\mathbb{V}(Z|X', A')] \end{align*}

Then the result clearly follows as both $P^{\pi}Z_{1}$ and $P^{\pi}Z_{2}$ evolve the same dynamics, that is, on $X', A'$. By linearity of the expectation, the statement concludes.

Magi Feeney
  • 51
  • 1
  • 5
0

This is a common trick in reinforcement learning literature which uses the law of large numbers to use the sampled variables $X$ and $A$ instead of $x$ and $a$. Let's say we know the probability $p(x)$ with which a continuous $x$ is given, then we can calculate the expectation value of $x$ as: \begin{align} \mathbb{E}_{x \sim p}[x] &= \int p(x)x ~dx \\ &= \lim \limits_{N \to \infty}\frac{1}{N}\sum_{N} X \end{align} In the last line, we did not assume any knowledge of $p(x)$ but we averaged over the samples taken from $p(x)$. This would be the only way to find the expectancy value for an unknown system, e.g. the average speed of a driver, of whom we don't know the exact driving behaviour. We can also use the sampled variables for an expression of the variance \begin{align} \mathbb{V}(x) = \int p(x)(x - \mathbb{E}_{x \sim p}(x))² dx = \lim \limits_{N \to \infty} \frac{1}{N} \sum_{N} (X-\mathbb{E}[X])² \end{align} From here, two things could happen, since the notation in the paper is not quite clear.

  1. Define $\mathbb{V}[X] = (X-\mathbb{E}[X])²$ then \begin{equation} \lim \limits_{N \to \infty} \frac{1}{N} \sum_{N} (X-\mathbb{E}[X])² = \lim \limits_{N \to \infty} \frac{1}{N} \sum_{N} \mathbb{V}[X] = \mathbb{E}[{\mathbb{V}[X]}]. \end{equation} But this would be a rather confusing notation since $(X-\mathbb{E}[X])²$ is only sample whose expecation value is the variance.
  2. Since the variance already is an expectation value (the expected, absolute difference from the mean), we can add an additional expectation value, which basically does not do anything: \begin{equation} \lim \limits_{N \to \infty} \frac{1}{N} \sum_{N} (X-\mathbb{E}[X])² = \mathbb{V}[X] = \mathbb{E}[\mathbb{V}[X]] = \lim \limits_{N \to \infty} \frac{1}{N} \sum_{N} \mathbb{V}[X] = \lim \limits_{N \to \infty} \frac{1}{N} N\mathbb{V}[X] = \mathbb{V}[X]. \end{equation} I lean towards this explanation, but still introducing a useless expectation value is quite strange.

Now, let's use this in order to explain why: \begin{align} \mathbb{V}(P^{\pi}Z_{1}(x, a)) - \mathbb{V}(P^{\pi}Z_{2}(x, a)) \\ = \int \pi(a) p(x)(P^{\pi}Z_{1}(x, a) - \mathbb{E}_{x \sim p, a\sim \pi}[{P^{\pi}Z_{1}(x, a)}])^{2}~dxda - \int \pi(a) p(x)(P^{\pi}Z_{2}(x, a) - \mathbb{E}_{x \sim p, a\sim \pi}[{P^{\pi}Z_{2}(x, a)}])^{2}~dxda \\ = \lim \limits_{N \to \infty} \frac{1}{N} \sum (Z_{1}(X', A') - \mathbb{E}[Z_{1}(X', A')])^{2} - (Z_{2}(X', A') - \mathbb{E}[Z_{2}(X', A')])^{2} \\ = \mathbb{E}[\mathbb{V}(Z_{1}(X', A')) - \mathbb{V}(Z_{2}(X', A'))] \\ \end{align} Where in the last step one can use any of the two possibilities given above. I abbreviated $p(x') = p(x'|x, a)$ and $\pi(a) = \pi(a|x)$.

pythonic833
  • 332
  • 2
  • 9