Does average loss function in GAN training is just an approximation of value function and does not ensure convergence of generator and discriminator?

Question

The value function on which convergence has been proved by the original paper of GAN is

$$\min_G \max_DV(D, G) = \mathbb{E}_{x ∼ P_{data}}[\log D(x)] + \mathbb{E}_{z ∼ p_z}[log (1 - D(G(z)))]$$

and the loss function used in training are

$$\max L(D) = \frac{1}{m} \sum_{i=1}^{m}\left[\log D\left(\boldsymbol{x}^{(i)}\right)+\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right]$$

$$\min L(G) = \frac{1}{m} \sum_{i=1}^{m}\left[\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right]$$

where $\{z^{(1)}, z^{(2)}, z^{(3)}, \cdots, z^{(m)}\}$ and $\{x^{(1)}, x^{(2)}, x^{(3)}, \cdots, x^{(m)}\}$ ate the noise samples and data samples for a mini-batch respectively.

I found after analyzing some questions 1, 2 on our main site that the loss function used for training is just an approximation of the value function and are not same in formal sense.

Is it true? If yes, what is the reason behind the disparity? Is the loss function used for implementation also ensures convergence?

Aray Karjauv · Answer 1 · 2021-08-02T12:39:12.760

Expected value can be thought of as a weighted average of outcomes. Thus, expectation and mean are the same thing, if each outcome has the same probability (which is $\frac{1}{m}$), so we can replace it with a sum divided by $m$. We can rewrite the equation: $$\min_G \max_DV(D, G) = \mathbb{E}_{x ∼ P_{data}}[\log D(x)] + \mathbb{E}_{z ∼ p_z}[log (1 - D(G(z)))]$$

First, we sample minibatch of size $m$ for $\boldsymbol{x} \sim P_{data}$ and $\boldsymbol{z} \sim \mathcal{N(0, 1)}$. Now we can replace the expectation with the sum:

$$ \begin{align*} \min_G \max_DV(D, G) &= \sum_{i=1}^{m}\left[p(\boldsymbol{x}^{(i)})\log D(\boldsymbol{x}^{(i)})\right] + \sum_{i=1}^{m}\left[p(\boldsymbol{z}^{(i)})log (1 - D(G(\boldsymbol{z}^{(i)})))\right] \\ &= \sum_{i=1}^{m}\left[\frac{1}{m}\log D(\boldsymbol{x}^{(i)})\right] + \sum_{i=1}^{m}[\frac{1}{m}log (1 - D(G(\boldsymbol{z}^{(i)})))]\\ &=\frac{1}{m}\sum_{i=1}^{m}\left[\log D(\boldsymbol{x}^{(i)}) + log (1 - D(G(\boldsymbol{z}^{(i)})))\right] \end{align*} $$

Binary cross entropy defined as follows:

$$H(p, q) = \operatorname{E}_p[-\log q] = H(p) + D_{\mathrm{KL}}(p \| q)=-\sum_x p(x)\log q(x)$$

Since we have a binary classification problem (fake/real), we can define $p \in \{y,1-y\}$ and $q \in \{\hat{y}, 1-\hat{y}\}$ and rewriting coros entropy as follows:

$$H(p, q)=-\sum_x p_x \log q_x =-y\log \hat{y}-(1-y)\log (1-\hat{y})$$

which is nothing but logistic loss. Since we know the source of our data (either real or fake), we can replace labels $y$ for real and fake with 1. We then get: $$\min_G\max_D L = \frac{1}{m} \sum_{i=1}^{m}\left[1\cdot\log D\left(\boldsymbol{x}^{(i)}\right)+1\cdot\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right] $$

This is the original loss. The first term in the equation gets always real images, while the second gets only generated. Hence, both terms have corresponding true labels. Read this article for more details.

Since the first term does not depend on $G$, we can rewrite it as follows:

$$\max L(D) = \frac{1}{m} \sum_{i=1}^{m}\left[\log D\left(\boldsymbol{x}^{(i)}\right)+\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right]$$

$$\min L(G) = \frac{1}{m} \sum_{i=1}^{m}\left[\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right]$$

Does average loss function in GAN training is just an approximation of value function and does not ensure convergence of generator and discriminator?

1 Answers1

Linked