Can training a model on a dataset composed by real images and drawings hurt the training process of a real-world application model?

Question

I'm training a multi-label classifier that's supposed to be tested on underwater images. I'm wondering if feeding the model drawings of a certain class plus real images can affect the results badly. Was there a study on this? Or are there any past experiences anyone could share to help?

score 0 · Answer 1 · answered Jan 10 '20 at 00:46

To my knowledge the deployment model (that you will test on underwater images) as inference will not have a negative effect. Yet drawings may even help differentiate some classes at training and inference. Provided that you won't use drawings in inference, adding them in training phase will not necessarily hurt the accuracy. Note that a drawing of a particular class should not be in the search domains of other classes, namely, a drawing of a particular class should not be the same with the other classes.

score 0 · Answer 2 · answered Aug 01 '23 at 08:56

In general, training on any distribution other that what you are testing on could give worse performance. The model learns to fit the distribution you train it on.

Certainly more data of a sufficiently close distribution will help, e.g. training on Imagenet and testing on CIFAR, but hand-drawings vs natural images seem very different and unless that is what you're wanting to test on, I would imagine it will hurt performance. Sufficiently similar data might include augmentations of the original data, such as crops, flips, blurring etc, as commonly used to effectively enlarge the training set. This is ok as this extra data is effectively "in distribution".

Think of it like this: you are training the model, of finite capacity, to learn your intended distribution plus some info about the extra data that it will never see again, so part of it's modelling capacity is being used up on a pointless task.

score 0 · Answer 3 · answered Apr 22 '25 at 13:39

Including both real images and drawings in your training set can negatively impact model performance if the visual styles differ too much, as the model may learn features that don't generalize well to real-world data. Unless the drawings closely resemble the underwater scenes you're targeting, it's safer to use them for pretraining or exclude them entirely during final training. If you do include them, consider techniques like style transfer or domain adaptation to align their distribution more closely with real images.

Can training a model on a dataset composed by real images and drawings hurt the training process of a real-world application model?

3 Answers3