Highest Voted 'loss' Questions - Artificial Intelligence Stack Exchange

11

votes

3 answers

Should I choose a model with the smallest loss or highest accuracy?

I have two Machine Learning models (I use LSTM) that have a different result on the validation set (~100 samples data): Model A: Accuracy: ~91%, Loss: ~0.01 Model B: Accuracy: ~83%, Loss: ~0.003 The size and the speed of both models are almost the…

asked Feb 07 '19 at 06:32

malioboro

2,859
3
23
47

4

votes

0 answers

GAN : Why does a perfect discriminator mean no gradient for the generator?

In the training of a Generative Adversarial Networks (GAN) system, a perfect discriminator (D) is one which outputs 1 ("true image") for all images of the training dataset and 0 ("false image") for all images created by the generator (G). I've read…

training generative-adversarial-networks loss gradient wasserstein-gan

asked Feb 02 '23 at 08:45

Soltius

311
1
2
10

3

votes

1 answer

Regression loss conditioned by the ground-truth values

I'm working on a regression problem with a CNN in which the input is a single image, and the output is an angle in degrees (which determines a specific measure related to the image). Sometimes, the model fails to retrieve the output accurately (for…

machine-learning deep-learning ai-design regression loss

asked Nov 08 '23 at 16:21

Cezoz08

53
3

3

votes

1 answer

Has anyone tried to train a GPT model predicting the next N tokens instead of the next one token?

I have been thinking about how learning via text works on humans: we read words, and often we need to read ahead a few words to understand more clearly the ideas that we read before. Most of the time, just reading the next word in a sentence is not…

ai-design transformer loss gpt

asked Apr 16 '23 at 19:25

bruno

33
2

3

votes

0 answers

Focal Loss vs Weighted Cross Entropy Loss

Weighted Focal Loss is defined like so $FL(p_t) = -\alpha_t log(p_t) (1-p_t)^\gamma $ Whereas weighted Cross Entropy Loss is defined like so $CE(p_t) = -\alpha_t log(p_t)$ Some blog posts try to explain the core difference, but I still fail to…

deep-learning classification loss cross-entropy

asked Feb 22 '22 at 09:02

Gulzar

789
1
10
27

3

votes

0 answers

How to interpret the training loss curves in Soft-Actor-Critic (SAC)?

I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is…

deep-rl actor-critic-methods loss soft-actor-critic learning-curve

asked Jul 01 '21 at 07:44

Manuel

45
5

3

votes

1 answer

How to perform back-propagation in Decoupled Neural Interfaces?

I am attempting to create a fully decoupled feed-forward neural network by using decoupled neural interfaces (DNIs) as explained in the paper Decoupled Neural Interfaces using Synthetic Gradients (2017) by Max Jaderberg et al. As in the paper, the…

neural-networks backpropagation papers loss

asked Dec 30 '16 at 01:42

Brian Sharp

41
1

2

votes

1 answer

Do we plug in the old values or the new values during the gradient descent update?

I have a scenario when I am trying to optimize a vector of D dimensions. Every component of the vector is dependent on other components according to a function such as: summation over (i,j): (1-e(x_i)(x_j))/2 where e is constant and x are embeddings…

objective-functions optimization gradient-descent loss

asked Nov 05 '24 at 10:07

Darkmoon Chief

31
3

2

votes

1 answer

Fluctuations in loss during in epoch evaluation of GRU

I am training a one-layer unidirectional vanilla GRU on a next item prediction task with regard to the last 10 interacted items. In my original experiment, where I trained on approx. 5.5M samples and validated on around 1M samples, I saw periodic…

deep-learning pytorch loss validation-loss

asked Apr 18 '23 at 08:21

PatrickSVM

53
3

2

votes

0 answers

Periodical fluctuations in loss curves

I am training a neural network (specifically a GRU based architecture but I think this is not too relevant for the question). My loss curves, especially the training loss but also the validation loss, show periodic fluctuations and I try to…

neural-networks deep-learning pytorch loss

asked Apr 11 '23 at 15:42

PatrickSVM

53
3

2

votes

2 answers

Does MSE loss function work in NN training for predicting values between 0-1?

In a NN regression problem, considering that MSE is squaring the error and the error is between 0 and 1 would it be pointless to use MSE as our loss function during model training? For example: MSE = (y_pred - y_true) ^ 2 @ Expected model output…

deep-learning loss mean-squared-error

asked Mar 27 '23 at 00:04

Darren Rahnemoon

27
5

2

votes

2 answers

Val loss doesn’t decrease after a certain number of epochs

I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. This is my network’s configuration Model( (fc): Sequential( (0):…

deep-learning classification long-short-term-memory pytorch loss

asked Oct 25 '22 at 04:18

helloworld

65
1
6

2

votes

2 answers

Why do we subtract logsumexp from the outputs of this neural network?

I'm trying to understand this tutorial for Jax. Here's an excerpt. It's for a neural net that is designed to classify MNIST images: from jax.scipy.special import logsumexp def relu(x): return jnp.maximum(0, x) def predict(params, image): #…

neural-networks machine-learning loss mnist

asked Jun 25 '22 at 02:06

Foobar

153
6

2

votes

1 answer

What is being optimized with WGAN loss? Is the generator maximizing or minimizing the critic value?

I am kind of new to the field of GANs and decided to develop a WGAN. All of the information online seems to be kind of contradicting itself. The more I read, the more I become confused, so I'm hoping y'all can clarify my misunderstanding with WGAN…

generative-adversarial-networks loss wasserstein-gan

asked Jun 24 '22 at 02:59

Gabriel Mongaras

31
4

2

votes

2 answers

Why does triplet loss allow to learn a ranking whereas contrastive loss only allows to learn similarity?

I am looking at this lecture, which states (link to exact time): What the triplet loss allows us in contrast to the contrastive loss is that we can learn a ranking. So it's not only about similarity, being closer together or being further apart,…

deep-learning reference-request loss triplet-loss-function contrastive-learning

asked May 03 '22 at 12:25

Gulzar

789
1
10
27

Questions tagged [loss]