How is the loss value calculated in order to compute the gradient?

Question

The gradient descent step is the following

\begin{align} \mathbf{W}_i = \mathbf{W}_{i-1} - \alpha * \nabla L(\mathbf{W}_{i-1}) \end{align}

were $L(\mathbf{W}_{i-1})$ is the loss value, $\alpha$ the learning rate and $\nabla L(\mathbf{W}_{i-1})$ the gradient of the loss.

So, how do we get to the $L(\mathbf{W}_{i-1})$ to calculate the gradient of $L(\mathbf{W}_{i-1})$? As an example, we can initialize the set of $\mathbf{W}$ to 0.5. How can you explain it to me?

nbro · Accepted Answer · 2019-12-29T18:57:12.543

In your case, $L$ is the loss (or cost) function, which can be, for example, the mean squared error (MSE) or the cross-entropy, depending on the problem you want to solve. Given one training example $(\mathbf{x}_i, y_i) \in D$, where $\mathbf{x}_i \in \mathbb{R}^d$ is the input (for example, an image) and $y_i \in \mathbb{R}$ can either be a label (aka class) or a numerical value, and $D$ is your training dataset, then the MSE is defined as follows

$$L(\mathbf{W}) = \frac{1}{2} \left(f(\mathbf{x}_i) - y_i \right)^2,$$

where $f(\mathbf{x}_i) \in \mathbb{R}$ is the output of the neural network $f$ given the input $\mathbf{x}_i$.

If you have a mini-batch of $M$ training examples $\{(\mathbf{x}_i, y_i) \}_{i=1}^M$, then the loss will be an average of the MSE for each training example. For more info, have a look at this answer https://ai.stackexchange.com/a/11675/2444. The https://ai.stackexchange.com/a/8985/2444 may also be useful.

See the article Loss and Loss Functions for Training Deep Learning Neural Networks for more info regarding different losses used in deep learning and how to choose the appropriate loss for your problem.

How is the loss value calculated in order to compute the gradient?

1 Answers1