Questions tagged [wasserstein-metric]

For questions about the Wasserstein metric/distance (or Kantorovich–Rubinstein metric), which is a metric (distance function) defined between probability distributions.

7 questions
8
votes
2 answers

Why is KL divergence used so often in Machine Learning?

The KL Divergence is quite easy to compute in closed form for simple distributions -such as Gaussians- but has some not-very-nice properties. For example, it is not symmetrical (thus it is not a metric) and it does not respect the triangular…
4
votes
1 answer

What is the reason for mode collapse in GAN as opposed to WGAN?

In this article I am reading: $D_{KL}$ gives us inifity when two distributions are disjoint. The value of $D_{JS}$ has sudden jump, not differentiable at $\theta=0$. Only Wasserstein metric provides a smooth measure, which is super helpful for a…
3
votes
1 answer

Are there some notions of distance between two policies?

I want to determine some distance between two policies $\pi_1 (a \mid s)$ and $\pi_2 (a \mid s)$, i.e. something like $\vert \vert \pi_1 (a \mid s) - \pi_2(a \mid s) \vert \vert$, where $\pi_i (a\mid s)$ is the vector $(\pi_i (a_1 \mid s), \dots,…
2
votes
1 answer

Why do we use a linear interpolation of fake and real data to penalize the gradient of discriminator in WGAN-GP

I'm trying to better frame/summarize the formulations and motivations behind Wasserstein GAN with gradient penalty, based on my understanding. For the basic GAN we are trying to optimize the following quantity: $$\min_\theta \max_\phi \mathbb{E}_{x…
1
vote
0 answers

How exactly do you backpropagate the gradient penalty in WGAN-GP?

I am trying to implement WGANs from scratch. The loss function for the critic is given by : which i implement in my code as L = average(real output) - average(fake output) + lambda*GP. For calculating GP, i just backpropagated ones in the critic to…
1
vote
0 answers

WGAN-GP Loss formalization

I have to write the formalization of the loss function of my network, built following the WGAN-GP model. The discriminator takes 3 consecutive images as input (such as 3 consecutive frames of a video) and must evaluate if the intermediate image is a…
1
vote
0 answers

Under what conditions can one find the optimal critic in WGAN?

The Kantorovich-Rubinstein duality for the optimal transport problem implies that the Wasserstein distance between two distributions $\mu_1$ and $\mu_2$ can be computed as (equation 2 in section 3 in the WGAN paper) $$W(\mu_1,\mu_2)=\underset{f\in…