For questions related to maximum likelihood estimation (MLE), which is a frequentist approach for estimating the parameters of an assumed probability distribution given some observed data. This is done by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The resulting estimate is known as the maximum likelihood estimate.
Questions tagged [maximum-likelihood]
19 questions
4
votes
1 answer
Likelihood function for Gaussian Discriminant Analsis
Im trying to understand how the likelhood function for gaussian discriminant analysis is derived. I self studying Murphy's Probabilistic Machine learning, and in it, he states the likelihood function as follows:
$$P(D|\theta) = \prod_{i=1:n}…
turtle_in_mind
- 143
- 3
3
votes
1 answer
How can a probability density value be used for the likelihood calculation?
Consider our parametric model $p_\theta$ for an underlying probabilistic distribution $p_{data}$.
Now, the likelihood of an observation $x$ is generally defined as $L(\theta|x) = p_{\theta}(x)$.
The purpose of the likelihood is to quantify how good…
hanugm
- 4,102
- 3
- 29
- 63
3
votes
0 answers
Is maximum likelihood estimation meaningless for a dataset of only outliers?
From my understanding, maximum likelihood estimation chooses the set of parameters for the estimator that maximizes likelihood with the ground truth distribution.
I always interpreted it as the training set having a tendency to have most examples…
ashenoy
- 1,419
- 6
- 19
3
votes
1 answer
What is the relationship between MLE and naive Bayes?
I have found various references describing Naive Bayes and they all demonstrated that it used MLE for the calculation. However, this is my understanding:
$P(y=c|x)$ $\propto$ $P(x|y=c)P(y=c)$
with $c$ is the class the model may classify $y$ as.
And…
Shrike Danny
- 31
- 1
2
votes
1 answer
Understanding the math behind using maximum likelihood for linear regression
I understand both terms, linear regression and maximum likelihood, but, when it comes to the math, I am totally lost. So I am reading this article The Principle of Maximum Likelihood (by Suriyadeepan Ramamoorthy). It is really well written, but, as…
xava
- 433
- 1
- 4
- 10
2
votes
1 answer
What is emperical distribution in MLE?
I was reading the book Deep Learning by Ian Goodfellow. I had a doubt in the Maximum likelihood estimation section (Pg 131). I understand till the Eq 5.58 which describes what is being maximized in the problem.
$$
\theta_{\text{ML}} =…
ANIRUDH BUVANESH
- 23
- 3
2
votes
0 answers
Can the cross-entropy loss be used for a NLP task with LSTM?
I am trying to build an LSTM model to generate Shakspeare-like poems. I have training set $\{s_1,s_2, \dots,s_m\}$, which are sentences of Shakespeare poems, and each sentence contains words $\{w_1,w_2, \dots,w_n\}$.
To my understanding, each…
Leey
- 43
- 3
1
vote
1 answer
How can gradient descent optimize a loss surface that's never fully computed?
In gradient descent for neural networks, we optimize over a loss surface defined by our loss function L(W) where W represents the network weights. However, since there are infinitely many possible weight configurations, we can never compute or store…
semahaissa
- 11
- 1
1
vote
1 answer
Connection between Sparse Autoencoders and maximum likelihood from Goodfellow deeplearning book
I am struggling to understand how sparse autoencoder can be thought of as "approximating maximum likelihood training of a generative model that has latent variables", from section 14.2.1 in Goodfellow's deeplearning book (pg. 502). I understand the…
hainabaraka
- 111
- 1
1
vote
2 answers
"a good model (with low loss) is one that assigns a high probability to the true output $y$ for each corresponding input $\mathbf{x}$"?
Chapter 1.2.1.6 Maximum likelihood estimation of Probabilistic Machine Learning: An Introduction by Kevin P. Murphy says the following:
When fitting probabilistic models, it is common to use the negative log probability as our loss…
The Pointer
- 611
- 5
- 22
1
vote
1 answer
I am confused of derivation steps of MAP for linear regression
I am taking ML course and I am confused about some derivations of math
Could you explain the two steps I marked on the slides? For the first step, I thought $P(beta|X,y) = \frac{P(X,y|beta)P(beta)}{P(X,y)}$ but I don't know the further steps to…
tesio
- 205
- 1
- 4
1
vote
0 answers
Is VAE the same as the E-step of the EM algorithm?
EM(Expectation Maximum)
Target: maximize $p_\theta(x)$
$ p_\theta(x)=\frac{p_\theta(x, z)}{p_\theta(z \mid x)} \\\\$
Take log on both sides:
$ \log p_\theta(x)=\log p_\theta(x, z)-\log p_\theta(z \mid x) \\\\$
Introduce distribution $q_\phi(z)$:
$…
Garfield
- 11
- 1
1
vote
0 answers
Optimize parametric Log-Likelihood with a Decision Tree
Suppose there are some objects with features, and the target is parametric density estimation. Density estimation is model-based. Parameters are obtained by maximizing log-likelihood.
$LL = \sum_{i \in I_1} \log \left( \sum_{j \in K_i} \theta_j…
nekrald
- 11
- 2
1
vote
0 answers
Why can't recurrent neural network handle large corpus for obtaining embeddings?
In order to learn the embeddings, we need to train a model based on some objective function. The model can be an RNN and the objective function can be the likelihood. We learn the embeddings by calculating the likelihood, and the embeddings are…
hanugm
- 4,102
- 3
- 29
- 63
1
vote
0 answers
Estimating $\sigma_i$ according to maximum likelihood method
Let be a Bayesian multivariate normal distribution classifier with distinct covariance matrices for each class and isotropic, i.e. with equal values over the entire diagonal and zero otherwise, $\mathbf{\Sigma}_i=\sigma_i^2\mathbf{I},~\forall…
David
- 113
- 3