There are 8 distinct action classes and around 50+ videos per class. I was wondering if flipping videos from the training set can be a good option to generate additional data. Is it?
1 Answers
Probably flipping a video left/right will be OK and useful for your case.
When considering data augmentation approaches, you should think about two things that may prevent it working:
Could the augentation change the label? E.g. would a human looking at the augmented data still label it the same way?
Does the augmentation create data that is too different from expected later use?
So in your case for action recognition you should only be concerned (and maybe not add the left-right flipped videos) if:
The activity label would change depending if tasks were performed left-handed or right-handed.
There is lots of content in the videos (e.g. writing) that is unrealistic when flipped. This is why you should only flip left/right, not top/bottom, in most cases. However, top/bottom and even arbitrary rotation might be fine if your videos are a top-down view, so it depends on your specific case.
As an aside, it is also important to keep the original and augmented copy in the same part of the dataset (training, cross-validation or testing), because they are correlated - not doing so may cause a data leak that will prevent you measuring performance correctly. To play this safe, you should only augment training data, so that you don't risk measuring generalisation of your model against imaginary production data that could not occur in reality.
Other augmentations you might consider for video could be small rotations, random crops, and minor colour, contrast and brightness adjustments.
- 33,739
- 3
- 47
- 66