Questions tagged [derivative]
18 questions
5
votes
1 answer
Is PyTorch's `grad_fn` for a non-differentiable function that function's inverse?
What is grad_fn for a non-differentiable function like slicing (grad_fn=), view (grad_fn=), etc.? Is grad_fn simply the function's inverse operation?
Where in the source code can I see the implementation of…
Geremia
- 555
- 1
- 5
- 12
4
votes
1 answer
Why is my derivation of the back-propagation equations inconsistent with Andrew Ng's slides from Coursera?
I am using the cross-entropy cost function to calculate its derivatives using different variables $Z, W$ and $b$ at different instances. Please refer image below for calculation.
As per my knowledge, my derivation is correct for $dZ, dW, db$ and…
learner
- 151
- 5
3
votes
1 answer
How is the max function differentiable wrt multiple arguments?
I recently came across an answer on StackOverflow that mentioned the max function being differentiable with respect to its values.
From my current understanding of mathematics, I'm struggling to comprehend how this is possible.
Could someone help…
Peyman
- 624
- 1
- 6
- 14
3
votes
2 answers
Why does critical points and stationary points are used interchangeably?
Consider the following paragraph from Numerical Computation of the deep learning book.
When $f'(x) = 0$, the derivative provides no information about which
direction to move. Points where $f'(x)$ = 0 are known as critical
points, or stationary…
hanugm
- 4,102
- 3
- 29
- 63
2
votes
0 answers
What is the dimensionality of these derivatives in the paper "Active Learning for Reward Estimation in Inverse Reinforcement Learning"?
I'm trying to implement in code part of the following paper: Active Learning for Reward Estimation in Inverse Reinforcement Learning.
I'm specifically referring to section 2.3 of the paper.
Let's define $\mathcal{X}$ as the set of states, and…
ИванКарамазов
- 141
- 5
1
vote
0 answers
Neural Networks that fit vector transforms
I have a CNN that is image to image and maps a binary image input to a binary image output. These are usually simple shapes, like a rectangle or a circle. Usually they become smoothed a bit (the effect of lithography.)
However, for my workflow I…
R S
- 11
- 1
1
vote
2 answers
Direct formula for calculating the optimum matrix which minimizes the perceptron error
Suppose we have a perceptron without bias and $f(x) = x$ as activation function and matrices $X,Y,W$ that input training data are columns of matrix $X$, $Y$ is targets matrix (columns are ordered with attention to the related inputs) and $W$ is the…
hasanghaforian
- 113
- 5
1
vote
0 answers
In MLP, to calculate the delta, do I need to calculate the derivative of the cost function? Or can I just use the cost function result?
In Multi Layer Perceptron networks, I have a question: in the formula to calculate the error in the output layer, Some articles say it is like this "deltaOutput = (predict - expected) * derivativeOutput". More in other sources, I saw that they…
will The J
- 267
- 1
- 6
1
vote
0 answers
Neural network learns to mimic distribution of classes in dataset instead of using signal from input
I'm trying to implement example from a classic AI paper named "Learning representations by back-propagating errors" by Hinton et al.
Example aims at training network able to predict third term in triples of (person_0, relationship, person_1) across…
Jan Grzybek
- 11
- 1
1
vote
2 answers
How the vector-space isomorphism between $\mathbb{R}^{m \times n}$ and $\mathbb{R}^{mn}$ guarantees reshaping matrices to vectors?
Consider the following paragraph from section 5.4 Gradients fo Matrices of the chapter Vector Calculus from the textbook titled Mathematics for Machine Learning by Marc Peter Deisenroth et al.
Since matrices represent linear mappings, we can…
hanugm
- 4,102
- 3
- 29
- 63
1
vote
2 answers
What does it mean "having Lipschitz continuous derivatives"?
We can enforce some constraints on functions used in deep learning in order to guarantee optimizations. You can find it in Numerical Computation of the deep learning book.
In the context of deep learning, we sometimes gain some guarantees…
hanugm
- 4,102
- 3
- 29
- 63
1
vote
0 answers
BlackOut - ICLR 2016: need help understanding the cost function derivative
In the ICLR 2016 paper BlackOut: Speeding up Recurrent Neural Network Language Models with very Large Vocabularies, on page 3, for eq. 4:
$$ J_{ml}^s(\theta) = log \ p_{\theta}(w_i | s) $$
They have shown the gradient computation in the subsequent…
anurag
- 151
- 1
- 7
0
votes
1 answer
Why does "in-place" mutation cause automatic differentiation to fail, and how to write code to avoid this problem
I work with a few different automatic differentiation frameworks, including pytorch, Jax, and Flux in Julia. Periodically I run some code and I get errors about mutations or operations occurring "in-place." These errors generally cause the program…
krishnab
- 207
- 2
- 8
0
votes
1 answer
What is the correct partial derivative of $Y^c$ with respect to $A_{ij}^{kc}$?
I have a question about the Grad-CAM++ paper. I do not understand how the following equation (10) for the alphas is obtained:
$$
\alpha_{ij}^{kc} =
\frac{\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2}}
{2\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2}
…
mlerma54
- 141
- 5
0
votes
1 answer
What is the rigorous and formal definition for the direction pointed by a gradient?
Consider the following definition of derivative from the chapter named Vector Calculus from the test book titled Mathematics for Machine Learning by Marc Peter Deisenroth et al.
Definition 5.2 (Derivative). More formally, for $h>0$ the…
hanugm
- 4,102
- 3
- 29
- 63