4

I've been using several resources to implement my own artificial neural network package in C++.

Among some of the resources I've been using are

https://www.anotsorandomwalk.com/backpropagation-example-with-numbers-step-by-step/

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

https://cs.stanford.edu/people/karpathy/convnetjs/intro.html,

as well as several others.

My code manages to replicate the results in the first two resources exactly. However, these are fairly simple networks in terms of depth. Hence the following (detailed) question:

For my implementation, I've been working with the MNIST Database of handwritten digits (http://yann.lecun.com/exdb/mnist/).

Using the ANN package I wrote, I have created a simple ANN with 784 input neurons, one hidden layer with 16 neurons, as well as an output layer with ten neurons. I have implemented ReLU on the hidden layer and the ouput layer, as well as a softmax on the output layer to get probabilities.The weights and biases are each individiually initialized to random values in the range [-1,1]

So the network is 784x16x10.

My backpropagation incorporates weight gradient and bias gradient logic.

With this configuration, I repeatedly get about a 90% hit rate with a total average cost of ~0.07 on the MNIST training set comprising 60,000 digits, and a slightly higher hit rate of ~92.5% on the test set comprising 10,000 digits.

For my first implementation of an ANN, I am pretty happy with that. However, my next thought was:

"If I add another hidden layer, I should get even better results...?".

So I created another artificial network with the same configuration, except for the addition of another hidden layer of 16 neurons, which I also run through a reLU. So this network is 784x16x16x10.

On this ANN, I get significantly worse results. The hit rate on the training set repeatedly comes out at ~45% with a total average error of ~0.35, and on the test set I also only get about 45%.

This leads me to either one or both of the following conclusions:

A) My implementation of the ANN in C++ is somehow faulty. If so, my bet would be it is somewhere in the backpropagation, as I am not 100% certain my weight gradient and bias gradient calculation is correct for any layers before the last hidden layer.

B) This is an expected effect. Something about adding another layer makes the ANN not suitable for this (digit classification) kind of problem.

Of course, A, B, or A and B could be true.

Could someone with more experience than me give me some input, especially on whether B) is true or not?

If B) is not true, then I know I have to look at my code again.

Chris
  • 25
  • 4

1 Answers1

2

You probably got the back propagation wrong. I have done a test on the accuracy on adding an extra layer and the accuracy went up from 94% to 96% for me. See this for details:

https://colab.research.google.com/drive/17kAJ2KJ36grG9sz-KW10fZCQW9i2Tf2c

To run the notebook click Open in playground and run the code. There is a commented line which add 1 extra layer. The syntax should be easy to understand even though it is in python.

For back propagation, you can try to see this python implementation of multi layer perceptron backpropagation.

https://github.com/enggen/Deep-Learning-Coursera/blob/master/Neural%20Networks%20and%20Deep%20Learning/Building%20your%20Deep%20Neural%20Network%20-%20Step%20by%20Step.ipynb

A network will not usually decrease it's accuracy by almost a half in normal scenario when you add an extra layer, though it is possible to have the network decrease accuracy when you add an extra layer due to overfitting. Though if this happen the performance drop won't be that dramatic.

Hope I can help you.

Clement
  • 1,755
  • 9
  • 24