6

While working through some example from Github I've found this network (it's for FashionMNIST but it doesn't really matter).

Pytorch forward method (my query in upper case comments with regards to applying Softmax on top of Relu?):

def forward(self, x):
    # two conv/relu + pool layers
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))

    # prep for linear layer
    # flatten the inputs into a vector
    x = x.view(x.size(0), -1)

    # DOES IT MAKE SENSE TO APPLY RELU HERE
    **x = F.relu(self.fc1(x))

    # AND THEN Softmax on top of it ?
    x = F.log_softmax(x, dim=1)**

    # final output
    return x
Neil Slater
  • 33,739
  • 3
  • 47
  • 66
Jed
  • 61
  • 2

1 Answers1

5

Does it make sense?

In general, yes it is interpretable, back propagation will work, and the NN can be optimised.

By using ReLU, the default network has a minimum logit of $0$ for the softmax input, which means at least initially that there will be higher minimum probabilities associated with all classes (compared to allowing negative logits which would happen randomly with usual weight initialisation). The network will need to learn to produce higher logit values for correct answers, because it has no ability to produce lower logit values for incorrect answers. This is like training a network to produce the highest regression value on one output, whilst clipping all values to be 0 or above, so it does not have the option of making one output e.g. $-1.0$ and the rest $-100.0$

It can probably be thought of as a type of regularisation, as it puts constraints on activation values that will work.

Is it needed?

That is less clear. You can try training with and without the line, and using cross-validation or a test set to see if there is a significant difference.

If the network has been designed well, then I'd expect to see a slight improvement with the added ReLU.

If it is a mistake, then I'd expect to see no difference, or better performance without the ReLU.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66