Highest Voted 'probability-distribution' Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

Why is KL divergence used so often in Machine Learning?

The KL Divergence is quite easy to compute in closed form for simple distributions -such as Gaussians- but has some not-very-nice properties. For example, it is not symmetrical (thus it is not a metric) and it does not respect the triangular…

asked Dec 15 '20 at 14:20

Federico Taschin

253
2
8

8

votes

1 answer

What are the main benefits of using Bayesian networks?

I have some trouble understanding the benefits of Bayesian networks. Am I correct that the key benefit of the network is that one does not need to use the chain rule of probability in order to calculate joint distributions? So, using the chain…

applications probability-distribution probability-theory bayesian-networks

asked Feb 18 '19 at 11:53

Sebastian Dine

181
1

7

votes

1 answer

What loss function to use when labels are probabilities?

What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, \dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$. It…

neural-networks machine-learning objective-functions probability-distribution

asked Apr 14 '19 at 22:13

Thomas Johnson

173
4

5

votes

1 answer

Many of the best probabilistic models represent probability distributions only implicitly

I am currently studying Deep Learning by Goodfellow, Bengio, and Courville. In chapter 5.1.2 The Performance Measure, P, the authors say the following: The choice of performance measure may seem straightforward and objective, but it is often…

deep-learning probability-distribution

asked Feb 25 '20 at 08:09

The Pointer

611
5
22

5

votes

2 answers

What is a probability distribution in machine learning?

If we were learning or working in the machine learning field, then we frequently come across the term "probability distribution". I know what probability, conditional probability, and probability distribution/density in math mean, but what is its…

machine-learning terminology definitions probability-distribution notation

asked Nov 28 '19 at 03:58

Eka

1,106
8
24

5

votes

1 answer

Why is the Jensen-Shannon divergence preferred over the KL divergence in measuring the performance of a generative network?

I have read articles on how Jensen-Shannon divergence is preferred over Kullback-Leibler in measuring how good a distribution mapping is learned in a generative network because of the fact that JS-divergence better measures distribution similarity…

objective-functions generative-adversarial-networks probability-distribution kl-divergence jensen-shannon-divergence

asked Nov 11 '19 at 16:01

ashenoy

1,419
6
19

4

votes

1 answer

How can I make an MNIST digit recognizer that rejects out-of-distribution data?

I've done an MNIST digit recognition neural network. When you put images in that are completely unlike its training data, it still tries to classify them as digits. Sometimes it strongly classifies nonsense data as being a specific digit. I am…

probability-distribution mnist

asked Apr 24 '23 at 09:39

river

143
6

4

votes

1 answer

How does the VAE learn a joint distribution?

I found the following paragraph from An Introduction to Variational Autoencoders sounds relevant, but I am not fully understanding it. A VAE learns stochastic mappings between an observed $\mathbf{x}$-space, whose empirical distribution…

papers generative-model variational-autoencoder probability-distribution variational-inference

asked Oct 23 '21 at 02:12

a12345

243
1
7

4

votes

1 answer

Why do we sample vectors from a standard normal distribution for the generator?

I am new to GANs. I noticed that everybody generates a random vector (usually 100 dimensional) from a standard normal distribution $N(0, 1)$. My question is: why? Why don't they sample these vectors from a uniform distribution $U(0, 1)$? Does the…

generative-adversarial-networks probability-distribution normal-distribution uniform-distribution

asked Mar 13 '21 at 17:35

dato nefaridze

882
10
22

4

votes

2 answers

When should one prefer using Total Variational Divergence over KL divergence in RL

In RL, both the KL divergence (DKL) and Total variational divergence (DTV) are used to measure the distance between two policies. I'm most familiar with using DKL as an early stopping metric during policy updates to ensure the new policy doesn't…

reinforcement-learning comparison probability-distribution kl-divergence total-variational-distance

asked Oct 07 '20 at 17:03

mugoh

549
4
21

4

votes

1 answer

What is the difference between model and data distributions?

Is there any difference between the model distribution and data distribution, or are they the same?

machine-learning comparison models probability-distribution statistics

asked Apr 05 '20 at 09:30

Bhuwan Bhatt

404
2
13

4

votes

1 answer

In deep learning, do we learn a continuous distribution based on the training dataset?

At least at some level, maybe not end-to-end always, but deep learning always learns a function, essentially a mapping from a domain to a range. The domain and range, at least in most cases, would be multi-variate. So, when a model learns a…

neural-networks deep-learning probability-distribution computational-learning-theory

asked Sep 18 '19 at 15:05

ashenoy

1,419
6
19

4

votes

1 answer

How are the parameters of the Bernoulli distribution learned?

In the paper Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask, they learn a mask for the network by setting up the mask parameters as $M_i = Bern(\sigma(v_i))$. Where $M$ is the parameter mask ($f(x;\theta, M) = f(x;M \odot \theta$),…

machine-learning probability-distribution weights

asked Jul 05 '19 at 13:55

mshlis

2,399
9
23

4

votes

1 answer

What does $x,y \sim \hat{p}_{data}$ mean in the Deep Learning book by Goodfellow

In chapter 5 of Deep Learning book of Ian Goodfellow, some notations in the loss function as below make me really confused. I tried to understand $x,y \sim p_{data}$ means a sample $(x, y)$ sampled from original dataset distribution (or $y$ is the…

machine-learning deep-learning probability-distribution notation expectation

asked May 25 '19 at 12:02

David Ng

143
4

3

votes

0 answers

Relation between SDE diffusion and DDPM/DDIM

Background & Definitions In DDPM, the diffusion backward step is described as follows (where $z\sim \mathcal{N}(0,I)$ and $x_{T}\sim \mathcal{N}(0,I)$): and in DDIM we have while in the SDE formulation (from the Fokker-Planck equation) the step…

neural-networks probability-distribution diffusion-models

asked Aug 24 '23 at 12:46

snatchysquid

89
6

Questions tagged [probability-distribution]