For questions related to the concept of loss (or cost) in machine learning or other AI sub-fields.
Questions tagged [loss]
90 questions
11
votes
3 answers
Should I choose a model with the smallest loss or highest accuracy?
I have two Machine Learning models (I use LSTM) that have a different result on the validation set (~100 samples data):
Model A: Accuracy: ~91%, Loss: ~0.01
Model B: Accuracy: ~83%, Loss: ~0.003
The size and the speed of both models are almost the…
malioboro
- 2,859
- 3
- 23
- 47
4
votes
0 answers
GAN : Why does a perfect discriminator mean no gradient for the generator?
In the training of a Generative Adversarial Networks (GAN) system, a perfect discriminator (D) is one which outputs 1 ("true image") for all images of the training dataset and 0 ("false image") for all images created by the generator (G).
I've read…
Soltius
- 311
- 1
- 2
- 10
3
votes
1 answer
Regression loss conditioned by the ground-truth values
I'm working on a regression problem with a CNN in which the input is a single image, and the output is an angle in degrees (which determines a specific measure related to the image).
Sometimes, the model fails to retrieve the output accurately (for…
Cezoz08
- 53
- 3
3
votes
1 answer
Has anyone tried to train a GPT model predicting the next N tokens instead of the next one token?
I have been thinking about how learning via text works on humans: we read words, and often we need to read ahead a few words to understand more clearly the ideas that we read before. Most of the time, just reading the next word in a sentence is not…
bruno
- 33
- 2
3
votes
0 answers
Focal Loss vs Weighted Cross Entropy Loss
Weighted Focal Loss is defined like so
$FL(p_t) = -\alpha_t log(p_t) (1-p_t)^\gamma $
Whereas weighted Cross Entropy Loss is defined like so
$CE(p_t) = -\alpha_t log(p_t)$
Some blog posts try to explain the core difference, but I still fail to…
Gulzar
- 789
- 1
- 10
- 27
3
votes
0 answers
How to interpret the training loss curves in Soft-Actor-Critic (SAC)?
I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is…
Manuel
- 45
- 5
3
votes
1 answer
How to perform back-propagation in Decoupled Neural Interfaces?
I am attempting to create a fully decoupled feed-forward neural network by using decoupled neural interfaces (DNIs) as explained in the paper Decoupled Neural Interfaces using Synthetic Gradients (2017) by Max Jaderberg et al. As in the paper, the…
Brian Sharp
- 41
- 1
2
votes
1 answer
Do we plug in the old values or the new values during the gradient descent update?
I have a scenario when I am trying to optimize a vector of D dimensions. Every component of the vector is dependent on other components according to a function such as: summation over (i,j): (1-e(x_i)(x_j))/2 where e is constant and x are embeddings…
Darkmoon Chief
- 31
- 3
2
votes
1 answer
Fluctuations in loss during in epoch evaluation of GRU
I am training a one-layer unidirectional vanilla GRU on a next item prediction task with regard to the last 10 interacted items. In my original experiment, where I trained on approx. 5.5M samples and validated on around 1M samples, I saw periodic…
PatrickSVM
- 53
- 3
2
votes
0 answers
Periodical fluctuations in loss curves
I am training a neural network (specifically a GRU based architecture but I think this is not too relevant for the question). My loss curves, especially the training loss but also the validation loss, show periodic fluctuations and I try to…
PatrickSVM
- 53
- 3
2
votes
2 answers
Does MSE loss function work in NN training for predicting values between 0-1?
In a NN regression problem, considering that MSE is squaring the error and the error is between 0 and 1 would it be pointless to use MSE as our loss function during model training?
For example:
MSE = (y_pred - y_true) ^ 2
@ Expected model output…
Darren Rahnemoon
- 27
- 5
2
votes
2 answers
Val loss doesn’t decrease after a certain number of epochs
I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. This is my network’s configuration
Model(
(fc): Sequential(
(0):…
helloworld
- 65
- 1
- 6
2
votes
2 answers
Why do we subtract logsumexp from the outputs of this neural network?
I'm trying to understand this tutorial for Jax.
Here's an excerpt. It's for a neural net that is designed to classify MNIST images:
from jax.scipy.special import logsumexp
def relu(x):
return jnp.maximum(0, x)
def predict(params, image):
#…
Foobar
- 153
- 6
2
votes
1 answer
What is being optimized with WGAN loss? Is the generator maximizing or minimizing the critic value?
I am kind of new to the field of GANs and decided to develop a WGAN. All of the information online seems to be kind of contradicting itself. The more I read, the more I become confused, so I'm hoping y'all can clarify my misunderstanding with WGAN…
Gabriel Mongaras
- 31
- 4
2
votes
2 answers
Why does triplet loss allow to learn a ranking whereas contrastive loss only allows to learn similarity?
I am looking at this lecture, which states (link to exact time):
What the triplet loss allows us in contrast to the contrastive loss is
that we can learn a ranking. So it's not only about similarity, being
closer together or being further apart,…
Gulzar
- 789
- 1
- 10
- 27