How do I avoid the "math domain error" when the input to the log is zero in the objective function of a neural network?

Question

I am implementing a neural network to train it on handwritten digits.

Here is the cost function that I am implementing.

$$J(\Theta)=-\frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K}\left[y_{k}^{(i)} \log \left(\left(h_{\Theta}\left(x^{(i)}\right)\right)_{k}\right)+\left(1-y_{k}^{(i)}\right) \log \left(1-\left(h_{\Theta}\left(x^{(i)}\right)\right)_{k}\right)\right]+ \\\frac{\lambda}{2 m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_{l}} \sum_{j=1}^{s_{l+1}}\left(\Theta_{j, i}^{(l)}\right)^{2}$$

In $\log(1-(h(x))$, if $h(x)$ is $1$, then it would result in $\log(1-1)= \log(0)$. So, I'm getting a math error.

I'm initializing the weights randomly between 10-60. I'm not sure where I have to change and what I have to change.

score 1 · Accepted Answer · edited Dec 12 '21 at 08:42

So, firstly, for $h_{\Theta}(x)$ to be $1$, the weighted sum of $x$ (after you dot product it with $\Theta$) would have to be literally infinity, if you're using the sigmoid function. Doesn't happen in practice, even with the rounding computers do, as we don't use big numbers to initialize our $\Theta$ matrices.

Intuitively, that'd mean you're basically more certain than one can possibly be in this universe that the label of this example should be $1$.

So, if $(1 - h_{\Theta}(x)) = 0$, $y$ is certainly $1$, and so $1-y$ will be zero.

Secondly, the convention is to drop the entire right-hand-side term when $y^{(i)}$ is $1$. This will not cause problems when programming, due to the first point I made above.

How do I avoid the "math domain error" when the input to the log is zero in the objective function of a neural network?

1 Answers1