4

I'm reading the ImageNet Classification with Deep Convolutional Neural Networks paper by Krizhevsky et al, and came across these lines in the Intro paragraph:

Their (convolutional neural networks') capacity can be controlled by varying their depth and breadth, and they also make strong and mostly correct assumptions about the nature of images (namely, stationarity of statistics and locality of pixel dependencies). Thus, compared to standard feedforward neural networks with similarly-sized layers, CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.

What's meant by "stationarity of statistics" and "locality of pixel dependencies"? Also, what's the basis of saying that CNN's theoretically best performance is only slightly worse than that of feedforward NN?

nbro
  • 42,615
  • 12
  • 119
  • 217
Shirish
  • 403
  • 3
  • 11

1 Answers1

3

locality of pixel dependencies probably means that neighboring pixels tend to be correlated, while faraway pixels are usually not correlated. This assumption is usually made in several image processing techniques (e.g. filters). Of course, the size and the shape of the neighborhood could vary, depending on the region of the image (or whatever), but, in practice, it is usually chosen to be fixed and rectangular (or squared).

stationarity of statistics might mean that the values of the pixels do not change over time, so this could be related to diffusion techniques in image processing. stationarity of statistics might also mean that the values of the pixels do not change much in a spatial neighborhood, even though, stationarity, e.g. in reinforcement learning, usually means that something does not change over time (so, if that's the case, the terminology stationarity is at least misleading and confusing, in this context), so this might be related to the locality of pixel dependencies property. Possibly, stationarity of statistics could also indirectly mean that you can use the same filter to detect the same feature in different regions of the image.

With while their theoretically-best performance is likely to be only slightly worse the authors probably thought that, theoretically, CNNs are not as powerful as feedforward neural networks. However, both CNNs and FFNNs are universal function approximators (but, at the time, nobody probably had yet investigated seriously the theoretical powerfulness of CNNs).

nbro
  • 42,615
  • 12
  • 119
  • 217