3

To train the discriminator network in GANs we set the label for the true samples as $1$ and $0$ for fake ones. Then we use binary cross-entropy loss for training.

Since we set the label $1$ for true samples that means $p_{data}(x) = 1$ and now binary cross-entropy loss is: $$L_1 = \sum_{i=1}^{N} P_{data}(x_i)log(D(x)) + (1-P_{data}(x_i))log(1-D(x))$$ $$L_1 = \sum_{i=1}^{N} P_{data}(x_i)log(D(x))$$ $$L_1 = E_{x \sim P_{data}(x)}[log(D(x))]$$

For the second part, since we set the label $0$ for fake samples that means $p_{z}(z) = 0$ and now binary cross-entropy loss is: $$L_2 = \sum_{i=1}^{N} P_{z}(z_i)log(D_{G}(z)) + (1-P_{z}(z_i))log(1-D_{G}(z))$$ $$L_2 = \sum_{i=1}^{N} 1-P_{z}(z_i)log(1-D_{G}(z))$$ $$L_2 = E_{z \sim \bar{P_{z}(z)}}[log(1-D_{G}(z))]$$ Now we combine those two losses and get: $$L_D = E_{x \sim P_{data}(x)}[log(D(x))] + E_{z \sim \bar{P_{z}(z)}}[log(1-D_{G}(z))]$$ When I was reading about GANs I saw that the loss function for discriminator is defined as: $$L_D = E_{x \sim P_{data}(x)}[log(D(x))] + E_{z \sim P_{z}(z)}[log(1-D_{G}(z))]$$ Should not it be $E_{z \sim \bar{P_{z}(z)}}$ instead of $E_{z \sim P_{z}(z)}$ ?

hanugm
  • 4,102
  • 3
  • 29
  • 63
Swakshar Deb
  • 703
  • 4
  • 12

1 Answers1

1

I doubt whether your derivation is correct. Your are trying to apply binary-cross entropy for data on each label separately, which is not the correct way to do.

The procedure for calculating binary cross entropy is as follows

  1. Pass the input $x$ whose label is $y \in \{0, 1\}$ to your model $M$.
  2. Obtain $\hat{y} \in [0, 1]$ as output of your model $M$ instead of actual label $y$.
  3. Calculate binary cross-entropy loss using the equation

$$L_{CE} = y \log \hat{y} + (1-y) \log (1 - \hat{y})$$

It is true that there are two types of inputs to a discriminator: genuine and fake. Genuine data is labelled by 1 and fake data is labelled by 0. Use the variable $x'$ to represent the input to the discriminator module $D$. If the input $x'$ is genuine then its label is 1 and if your input $x'$ is fake then its label is 0. Note that it is better to avoid the unnecessary details regarding the generator or noise vector while formulating the binary cross-entropy loss of discriminator. Just see discriminator as a module taking two classes of inputs: genuine and fake. Suppose the discriminator outputs $\hat{y} \in [0, 1]$ for the input $x'$ instead of actual label $y \in \{0, 1\}$ then the binary cross-entropy loss is given by

$$L_{CE} = y \log \hat{y} + (1-y) \log(1-\hat{y})$$ $$\implies L_{CE} = y \log D(x') + (1-y) \log(1 - D(x'))$$

Suppose the input $x'$ is a genuine one $x$ then $y = 1$ and

$$\implies L_{CE} = \log D(x)$$

Suppose the input $x'$ is a fake one $G(z)$ then $y = 0$ and

$$\implies L_{CE} = \log (1-D(G(z)))$$

Since the labels are clear from the input of the discriminator $D$, we can write the binary cross-entropy loss for $2m$ samples $\{x_1, x_2, x_3, \cdots, x_m, z_1, z_2, z_3, \cdots, z_m\}$ as

$$\implies L_{CE}^{2m} = \dfrac{1}{2m} \sum\limits_{i = 1}^{m} \log D(x_i) + \sum\limits_{i = 1}^{m} \log (1-D(G(z_i)))$$

Later, we need to perform some mathematical analysis, which I am not sure about whether it is due to law of large numbers or some other 1, 2 we equate the mean to the actual expectations on probability distribution and hence

$$ L_{CE}^{2m} = \dfrac{1}{2} \sum\limits_{i = 1}^{m} \dfrac{1}{m} \log D(x_i) + \sum\limits_{i = 1}^{m} \dfrac{1}{m} \log (1-D(G(z_i)))$$

$$ = \dfrac{1}{2} {\LARGE(} \mathbb{E}_{x ∼ P_{data}}[\log D(x)] + \mathbb{E}_{z ∼ p_z}[log (1 - D(G(z)))] {\LARGE)}$$

Since removing $\dfrac{1}{2}$ does not matter while optimizing the loss function, the final loss function is given by

$$ L_{D} = \mathbb{E}_{x ∼ P_{data}}[\log D(x)] +\mathbb{E}_{z ∼ p_z}[log (1 - D(G(z)))]$$

hanugm
  • 4,102
  • 3
  • 29
  • 63