From Goodfellow et al. (2014), we have the adversarial loss:
$$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$
In practice, the expectation is computed as a mean over the minibatch. For example, the discriminator loss is:
$$ \nabla_{\theta_{d}} \frac{1}{m} \sum_{i=1}^{m}\left[\log D\left(\boldsymbol{x}^{(i)}\right)+\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right] $$
My question is: why is the mean used to compute the expectation? Does this imply that $p_{data}$ is uniformly distributed, since every sample must be drawn from $p_{data}$ with equal probability?
The expectation, expressed as an integral, is:
$$ \begin{aligned} V(G, D) &=\int_{\boldsymbol{x}} p_{\text {data }}(\boldsymbol{x}) \log (D(\boldsymbol{x})) d x+\int_{\boldsymbol{z}} p_{\boldsymbol{z}}(\boldsymbol{z}) \log (1-D(g(\boldsymbol{z}))) d z \\ &=\int_{\boldsymbol{x}} p_{\text {data }}(\boldsymbol{x}) \log (D(\boldsymbol{x}))+p_{g}(\boldsymbol{x}) \log (1-D(\boldsymbol{x})) d x \end{aligned} $$
So, how do we go from an integral involving a continuous distribution to summing over discrete probabilities, and further, that all those probabilities are the same?
The best I could find from other StackExchange posts is that the mean is just an approximation, but I'd really like a more rigorous explanation.
This question isn't exclusive to GANs, but is applicable to any loss function that is expressed mathematically as an expectation over some sampled distribution, which is not implemented directly via the integral form.
(All equations are from the Goodfellow paper.)
 
     
    