3

I need to solve a video classification problem. While looking for solutions, I only found solutions that transform this problem into a series of simpler image classification tasks. However, this method has a downside: we ignore the temporal relationship between the frames.

So, how can I perform video classification, with a CNN, by taking into account the temporal relationships between frames?

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

1

Look at spatio-temporal CNNs which extend the image-based CNN in 2D to 3D to handle time. These are commonly used to detect or classify action in a video. People have used them to identify specific actions in various sports such as kicking a soccer ball, throwing a baseball or dribbling a basketball. They have been used to identify fire, smoke, deep fakes, and violence in videos. Below are some articles to help get started.

Source code:

Brian O'Donnell
  • 1,997
  • 9
  • 23