A Feedforward Neural Network (FNN) implemented with RMSProp optimization is exhibiting a tendency to overclassify instances into one particular class

Question

I'm coding an FNN in Rust using the nalgebra crate. I coded the backpropagation based on this article from Brilliant (the link directly highlights the formulas' section I).

The issue

My network tends to overclassify one class. See logs of some iterations:

Pred: [0.4957141833444244, 0.5042858166555757] Exp: [1.0, 0.0]
 Pred: [0.1159099292782023, 0.8840900707217978] Exp: [0.0, 1.0]
 Pred: [0.49812391110550713, 0.5018760888944928] Exp: [0.0, 1.0]
 Pred: [0.785823559592914, 0.21417644040708617] Exp: [1.0, 0.0]
(0) Loss: 1.7553771084566945  accuracy: 75%
 Pred: [0.921104553204082, 0.07889544679591799] Exp: [1.0, 0.0]
 Pred: [0.43669346100767914, 0.5633065389923209] Exp: [0.0, 1.0]
 Pred: [0.8390020708741205, 0.16099792912587946] Exp: [0.0, 1.0]
 Pred: [0.9358457093202687, 0.0641542906797313] Exp: [1.0, 0.0]
(1) Loss: 2.5487814857735516  accuracy: 75%
 Pred: [0.9938891101436632, 0.0061108898563368074] Exp: [1.0, 0.0]
 Pred: [0.8117580010095468, 0.18824199899045324] Exp: [0.0, 1.0]
 Pred: [0.9637103688436267, 0.03628963115637338] Exp: [0.0, 1.0]
 Pred: [0.9836993037515837, 0.01630069624841633] Exp: [1.0, 0.0]
(2) Loss: 5.008814788847665  accuracy: 50%
 Pred: [0.9995224418411457, 0.0004775581588542201] Exp: [1.0, 0.0]
 Pred: [0.9640882324687131, 0.03591176753128681] Exp: [0.0, 1.0]
 Pred: [0.9940401356203591, 0.005959864379640862] Exp: [0.0, 1.0]
 Pred: [0.9967080323177023, 0.0032919676822977125] Exp: [1.0, 0.0]
(3) Loss: 8.453172874424054  accuracy: 50%
 Pred: [0.9999695301335619, 3.0469866438138036e-5] Exp: [1.0, 0.0]
 Pred: [0.9955011694391566, 0.004498830560843456] Exp: [0.0, 1.0]
 Pred: [0.9993378187078414, 0.0006621812921585512] Exp: [0.0, 1.0]
 Pred: [0.9995329830642827, 0.00046701693571723986] Exp: [1.0, 0.0]
(4) Loss: 12.724406571994546  accuracy: 50%
 Pred: [0.9999986403638063, 1.3596361937582465e-6] Exp: [1.0, 0.0]
 Pred: [0.9996152538707804, 0.0003847461292195687] Exp: [0.0, 1.0]
 Pred: [0.9999496312441125, 5.036875588738215e-5] Exp: [0.0, 1.0]
 Pred: [0.9999544427172501, 4.5557282750020274e-5] Exp: [1.0, 0.0]
(5) Loss: 17.759113261749448  accuracy: 50%
 Pred: [0.9999999567632185, 4.323678145397258e-8] Exp: [1.0, 0.0]
 Pred: [0.9999768122330864, 2.3187766913582424e-5] Exp: [0.0, 1.0]
 Pred: [0.9999973081629956, 2.6918370044323987e-6] Exp: [0.0, 1.0]
 Pred: [0.9999969459673318, 3.054032668285032e-6] Exp: [1.0, 0.0]
(6) Loss: 23.497175499984603  accuracy: 50%

This also happens when I have more classes, using other datasets (like the MNIST hand-digits).

My implementation

Based on the Brilliant article I coded a matrix based implementation of the Backpropagation, i.e. all calculations are made matrix-wise. I do not see any problems with my implementation. I also coded the RMSProp which seems to be fine since when I remove it and only use SGD to train the networks the same issue happens.

If you want to take a look at the whole project, I created a branch, which is a copy of the current main branch, that I'll never delete for the sake of this question.

What I tried

Checked every activation and loss function I have coded (see my functions module code). I'm using sigmoid for the hidden layer and softmax for the output layer. But even if I use relu for the hidden layer the issue is the same.
Applied the math by hand, it seems to be check (I'm not an expert at math at all).
Remove the RMSProp, just used SGD, same issue the network doesn't learn.
Use the Relu activation function for the hidden layer.

Question

In theory, what could cause this?

score 0 · Accepted Answer · answered Nov 23 '23 at 16:20

How are you initializing the weights of each layer? It turns out that weight initialization plays a surprisingly large part in the network's ability to learn, but these issues are abstracted away in libraries torch. I wonder if that could be an issue here. See this paper

A Feedforward Neural Network (FNN) implemented with RMSProp optimization is exhibiting a tendency to overclassify instances into one particular class

The issue

My implementation

What I tried

Question

1 Answers1