I'm training a Tensorflow model that receives an image and segments the image into foreground and background. That is, if the input image is w x h x 3, then the neural network outputs a w x h x 1 image of 0's and 1's, where 0 represents background and 1 represents foreground.
I've computed that about 75% of the true mask is background, so the neural network simply trains a model that outputs all 0's and gets a 75% accuracy.
To solve this, I'm thinking of implementing a custom loss function that checks if there are more than a certain percentage of 0's, and if so, to add a very large number to the loss to disincentivize the all 0's strategy.
The issue is that this loss function becomes non-differentiable.
Where should I go from here?