Should I remove the text overlaying some images in the dataset before training the CNN?

Question

If I am attempting to train a CNN on some image data to perform image classification, but some of the images have pieces of text overlaying them (for the purpose of description to humans), then is it better for the CNN to remove the text? And if so, then how do I remove the text? Furthermore, is it a good idea to use both the images with text overlaying them and the images with the removed text for training, since it might act as a form of data augmentation?

score 1 · Accepted Answer · answered Sep 18 '20 at 04:54

Removing the overlayed text might increase accuracy, but you'd need to train a different model to do this, and that is an entirely different task as it is no longer classification, but generation. There are easier ways to augment your data and probably get similar benefits to your accuracy. However, if you would still like to do this, there is a lot of examples you can find by simple searching "Watermark removal machine learning" in google. Here's an example I found.

Overall, a CNN will be able to look past the overlayed text without issue, and perform classification like it would without the overlayed text. There is the possibility that it actually learns relationships between the overlayed text and the expected output, but that depends on the data, and is likely a harder task then simply identifying features.

The only issue you might run into is if the real data this model will be used on is different to the data provided, as in the real word images do not contain overlayed text describing what the image is.

Should I remove the text overlaying some images in the dataset before training the CNN?

1 Answers1