Questions tagged [loss]

For questions related to the concept of loss (or cost) in machine learning or other AI sub-fields.

90 questions
11
votes
3 answers

Should I choose a model with the smallest loss or highest accuracy?

I have two Machine Learning models (I use LSTM) that have a different result on the validation set (~100 samples data): Model A: Accuracy: ~91%, Loss: ~0.01 Model B: Accuracy: ~83%, Loss: ~0.003 The size and the speed of both models are almost the…
malioboro
  • 2,859
  • 3
  • 23
  • 47
4
votes
0 answers

GAN : Why does a perfect discriminator mean no gradient for the generator?

In the training of a Generative Adversarial Networks (GAN) system, a perfect discriminator (D) is one which outputs 1 ("true image") for all images of the training dataset and 0 ("false image") for all images created by the generator (G). I've read…
3
votes
1 answer

Regression loss conditioned by the ground-truth values

I'm working on a regression problem with a CNN in which the input is a single image, and the output is an angle in degrees (which determines a specific measure related to the image). Sometimes, the model fails to retrieve the output accurately (for…
3
votes
1 answer

Has anyone tried to train a GPT model predicting the next N tokens instead of the next one token?

I have been thinking about how learning via text works on humans: we read words, and often we need to read ahead a few words to understand more clearly the ideas that we read before. Most of the time, just reading the next word in a sentence is not…
bruno
  • 33
  • 2
3
votes
0 answers

Focal Loss vs Weighted Cross Entropy Loss

Weighted Focal Loss is defined like so $FL(p_t) = -\alpha_t log(p_t) (1-p_t)^\gamma $ Whereas weighted Cross Entropy Loss is defined like so $CE(p_t) = -\alpha_t log(p_t)$ Some blog posts try to explain the core difference, but I still fail to…
Gulzar
  • 789
  • 1
  • 10
  • 27
3
votes
0 answers

How to interpret the training loss curves in Soft-Actor-Critic (SAC)?

I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is…
3
votes
1 answer

How to perform back-propagation in Decoupled Neural Interfaces?

I am attempting to create a fully decoupled feed-forward neural network by using decoupled neural interfaces (DNIs) as explained in the paper Decoupled Neural Interfaces using Synthetic Gradients (2017) by Max Jaderberg et al. As in the paper, the…
2
votes
1 answer

Do we plug in the old values or the new values during the gradient descent update?

I have a scenario when I am trying to optimize a vector of D dimensions. Every component of the vector is dependent on other components according to a function such as: summation over (i,j): (1-e(x_i)(x_j))/2 where e is constant and x are embeddings…
2
votes
1 answer

Fluctuations in loss during in epoch evaluation of GRU

I am training a one-layer unidirectional vanilla GRU on a next item prediction task with regard to the last 10 interacted items. In my original experiment, where I trained on approx. 5.5M samples and validated on around 1M samples, I saw periodic…
PatrickSVM
  • 53
  • 3
2
votes
0 answers

Periodical fluctuations in loss curves

I am training a neural network (specifically a GRU based architecture but I think this is not too relevant for the question). My loss curves, especially the training loss but also the validation loss, show periodic fluctuations and I try to…
PatrickSVM
  • 53
  • 3
2
votes
2 answers

Does MSE loss function work in NN training for predicting values between 0-1?

In a NN regression problem, considering that MSE is squaring the error and the error is between 0 and 1 would it be pointless to use MSE as our loss function during model training? For example: MSE = (y_pred - y_true) ^ 2 @ Expected model output…
2
votes
2 answers

Val loss doesn’t decrease after a certain number of epochs

I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. This is my network’s configuration Model( (fc): Sequential( (0):…
2
votes
2 answers

Why do we subtract logsumexp from the outputs of this neural network?

I'm trying to understand this tutorial for Jax. Here's an excerpt. It's for a neural net that is designed to classify MNIST images: from jax.scipy.special import logsumexp def relu(x): return jnp.maximum(0, x) def predict(params, image): #…
Foobar
  • 153
  • 6
2
votes
1 answer

What is being optimized with WGAN loss? Is the generator maximizing or minimizing the critic value?

I am kind of new to the field of GANs and decided to develop a WGAN. All of the information online seems to be kind of contradicting itself. The more I read, the more I become confused, so I'm hoping y'all can clarify my misunderstanding with WGAN…
2
votes
2 answers

Why does triplet loss allow to learn a ranking whereas contrastive loss only allows to learn similarity?

I am looking at this lecture, which states (link to exact time): What the triplet loss allows us in contrast to the contrastive loss is that we can learn a ranking. So it's not only about similarity, being closer together or being further apart,…
1
2 3 4 5 6