For questions related to the concept of loss (or cost) function in the context of machine learning.
Questions tagged [objective-functions]
262 questions
21
votes
3 answers
How can we process the data from both the true distribution and the generator?
I'm struggling to understand the GAN loss function as provided in Understanding Generative Adversarial Networks (a blog post written by Daniel Seita).
In the standard cross-entropy loss, we have an output that has been run through a sigmoid function…
tryingtolearn
- 395
- 1
- 2
- 10
10
votes
1 answer
Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorch
I'm training an auto-encoder network with Adam optimizer (with amsgrad=True) and MSE loss for Single channel Audio Source Separation task. Whenever I decay the learning rate by a factor, the network loss jumps abruptly and then decreases until the…
imflash217
- 499
- 5
- 15
10
votes
1 answer
What is the difference between the triplet loss and the contrastive loss?
What is the difference between the triplet loss and the contrastive loss?
They look same to me. I don't understand the nuances between the two. I have the following queries:
When to use what?
What are the use cases and advantages or disadvantages…
Exploring
- 371
- 7
- 18
8
votes
2 answers
What is the difference between a loss function and reward/penalty in Deep Reinforcement Learning?
In Deep Reinforcement Learning (DRL) I am having difficulties in understanding the difference between a Loss function, a reward/penalty and the integration of both in DRL.
Loss function: Given an output of the model and the ground truth,…
Theo Deep
- 195
- 1
- 2
- 5
8
votes
1 answer
How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?
I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard.
However, I have problems understanding how one should come up with this kind of…
Yves Boutellier
- 183
- 6
8
votes
1 answer
What is the cost function of a transformer?
The paper Attention Is All You Need describes the transformer architecture that has an encoder and a decoder.
However, I wasn't clear on what the cost function to minimize is for such an architecture.
Consider a translation task, for example, where…
user3667125
- 1,700
- 9
- 16
8
votes
2 answers
How should we interpret this figure that relates the perceptron criterion and the hinge loss?
I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following:
The perceptron criterion is a shifted version of the hinge-loss used in…
The Pointer
- 611
- 5
- 22
8
votes
1 answer
What's the advantage of log_softmax over softmax?
Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem.
However, while I am learning the pytorch mnist tutorial,…
user1024
- 181
- 2
7
votes
4 answers
Can the mean squared error be negative?
I'm new to machine learning. I was watching a Prof. Andrew Ng's video about gradient descent from the machine learning online course. It said that we want our cost function (in this case, the mean squared error) to have the minimum value, but that…
Borna Ghahnoosh
- 171
- 1
- 2
7
votes
1 answer
What is an objective function?
Local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function.
My question is what is the objective function?
Abbas Ali
- 576
- 4
- 10
- 17
7
votes
1 answer
How to deal with losses on different scales in multi-task learning?
Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent.
If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix…
SpiderRico
- 1,040
- 10
- 18
7
votes
2 answers
Why does TensorFlow docs discourage using softmax as activation for the last layer?
The beginner colab example for tensorflow states:
Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…
galah92
- 173
- 5
7
votes
1 answer
What loss function to use when labels are probabilities?
What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, \dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.
It…
Thomas Johnson
- 173
- 4
6
votes
2 answers
How to check whether my loss function is convex or not?
Loss functions are useful in calculating loss and then we can update the weights of a neural network. The loss function is thus useful in training neural networks.
Consider the following excerpt from this answer
In principle, differentiability is…
hanugm
- 4,102
- 3
- 29
- 63
6
votes
1 answer
What is the impact of scaling the KL divergence and reconstruction loss in the VAE objective function?
Variational autoencoders have two components in their loss function. The first component is the reconstruction loss, which for image data, is the pixel-wise difference between the input image and output image. The second component is the…
rich
- 171
- 1
- 6