In Multi Layer Perceptron networks, I have a question: in the formula to calculate the error in the output layer, Some articles say it is like this "deltaOutput = (predict - expected) * derivativeOutput". More in other sources, I saw that they seemed to calculate the derivative of the cost function, they calculated derivative it in Wolfram, and the formula to calculate the error was become "derivativeError * derivativeOutput". But then I got confused, and had this question: do we need to calculate the derivative of the error(the cost function) to use in the formula to calculate the delta? Or are there cases where I can simply use "deltaOutput = (predict - expected) * derivativeOutput" in the formula without having to calculate the derivative of the error? Please, could you help me understand?
In MLP, In formula to calculating the delta, do I need to calculate the derivative of the cost function(As in the case of the function MSE) ? Or can I just use the cost function result, And the formula for calculating the error in the output layer would just be "(predict - expected) * derivativeOutput"?