Questions tagged [maximum-likelihood]

For questions related to maximum likelihood estimation (MLE), which is a frequentist approach for estimating the parameters of an assumed probability distribution given some observed data. This is done by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The resulting estimate is known as the maximum likelihood estimate.

19 questions
4
votes
1 answer

Likelihood function for Gaussian Discriminant Analsis

Im trying to understand how the likelhood function for gaussian discriminant analysis is derived. I self studying Murphy's Probabilistic Machine learning, and in it, he states the likelihood function as follows: $$P(D|\theta) = \prod_{i=1:n}…
3
votes
1 answer

How can a probability density value be used for the likelihood calculation?

Consider our parametric model $p_\theta$ for an underlying probabilistic distribution $p_{data}$. Now, the likelihood of an observation $x$ is generally defined as $L(\theta|x) = p_{\theta}(x)$. The purpose of the likelihood is to quantify how good…
3
votes
0 answers

Is maximum likelihood estimation meaningless for a dataset of only outliers?

From my understanding, maximum likelihood estimation chooses the set of parameters for the estimator that maximizes likelihood with the ground truth distribution. I always interpreted it as the training set having a tendency to have most examples…
3
votes
1 answer

What is the relationship between MLE and naive Bayes?

I have found various references describing Naive Bayes and they all demonstrated that it used MLE for the calculation. However, this is my understanding: $P(y=c|x)$ $\propto$ $P(x|y=c)P(y=c)$ with $c$ is the class the model may classify $y$ as. And…
2
votes
1 answer

Understanding the math behind using maximum likelihood for linear regression

I understand both terms, linear regression and maximum likelihood, but, when it comes to the math, I am totally lost. So I am reading this article The Principle of Maximum Likelihood (by Suriyadeepan Ramamoorthy). It is really well written, but, as…
xava
  • 433
  • 1
  • 4
  • 10
2
votes
1 answer

What is emperical distribution in MLE?

I was reading the book Deep Learning by Ian Goodfellow. I had a doubt in the Maximum likelihood estimation section (Pg 131). I understand till the Eq 5.58 which describes what is being maximized in the problem. $$ \theta_{\text{ML}} =…
2
votes
0 answers

Can the cross-entropy loss be used for a NLP task with LSTM?

I am trying to build an LSTM model to generate Shakspeare-like poems. I have training set $\{s_1,s_2, \dots,s_m\}$, which are sentences of Shakespeare poems, and each sentence contains words $\{w_1,w_2, \dots,w_n\}$. To my understanding, each…
1
vote
1 answer

How can gradient descent optimize a loss surface that's never fully computed?

In gradient descent for neural networks, we optimize over a loss surface defined by our loss function L(W) where W represents the network weights. However, since there are infinitely many possible weight configurations, we can never compute or store…
1
vote
1 answer

Connection between Sparse Autoencoders and maximum likelihood from Goodfellow deeplearning book

I am struggling to understand how sparse autoencoder can be thought of as "approximating maximum likelihood training of a generative model that has latent variables", from section 14.2.1 in Goodfellow's deeplearning book (pg. 502). I understand the…
1
vote
2 answers

"a good model (with low loss) is one that assigns a high probability to the true output $y$ for each corresponding input $\mathbf{x}$"?

Chapter 1.2.1.6 Maximum likelihood estimation of Probabilistic Machine Learning: An Introduction by Kevin P. Murphy says the following: When fitting probabilistic models, it is common to use the negative log probability as our loss…
1
vote
1 answer

I am confused of derivation steps of MAP for linear regression

I am taking ML course and I am confused about some derivations of math Could you explain the two steps I marked on the slides? For the first step, I thought $P(beta|X,y) = \frac{P(X,y|beta)P(beta)}{P(X,y)}$ but I don't know the further steps to…
1
vote
0 answers

Is VAE the same as the E-step of the EM algorithm?

EM(Expectation Maximum) Target: maximize $p_\theta(x)$ $ p_\theta(x)=\frac{p_\theta(x, z)}{p_\theta(z \mid x)} \\\\$ Take log on both sides: $ \log p_\theta(x)=\log p_\theta(x, z)-\log p_\theta(z \mid x) \\\\$ Introduce distribution $q_\phi(z)$: $…
1
vote
0 answers

Optimize parametric Log-Likelihood with a Decision Tree

Suppose there are some objects with features, and the target is parametric density estimation. Density estimation is model-based. Parameters are obtained by maximizing log-likelihood. $LL = \sum_{i \in I_1} \log \left( \sum_{j \in K_i} \theta_j…
1
vote
0 answers

Why can't recurrent neural network handle large corpus for obtaining embeddings?

In order to learn the embeddings, we need to train a model based on some objective function. The model can be an RNN and the objective function can be the likelihood. We learn the embeddings by calculating the likelihood, and the embeddings are…
1
vote
0 answers

Estimating $\sigma_i$ according to maximum likelihood method

Let be a Bayesian multivariate normal distribution classifier with distinct covariance matrices for each class and isotropic, i.e. with equal values over the entire diagonal and zero otherwise, $\mathbf{\Sigma}_i=\sigma_i^2\mathbf{I},~\forall…
1
2