0

SigmoidBinaryCrossEntropyLoss implementation in DJL accepts two kinds of outputs from NNs:

  1. where sigmoid activation has already been applied.
  2. where raw NN output is taken as is.

The choice is determied by fromSigmoid parameter.

Here's how (1) fromSigmoid=true looks like:

loss =
      epsLog(pred)
      .mul(lab)
      .add(epsLog(NDArrays.sub(1., pred)).mul(NDArrays.sub(1., lab)));

And here's how (2) fromSigmoid=false looks like:

loss =
      Activation.relu(pred)
      .sub(pred.mul(lab))
      .add(Activation.softPlus(pred.abs().neg()));

The logic behind (1) fromSigmoid=true branch looks like a standard cross entropy loss implementation, but I fail to understand why (2) fromSigmoid=false is implemented the way it is.

For instance, why is there no sigmoid application, why is there relu later followed by softPlus? I'd like to understand what does (2) do and what's the theory behind its implementation.

src091
  • 1
  • 2