0

I wonder what happens to the 'channels' dimension (usually 3 for RGB images) after the first convolution layer in CNNs?

In books and other sources, it is always said that the depth of the output from convolutional layers is the number of kernels (filters) in that layer.

But, if the input image has 3 channels and we convolve each of them with $K$ kernels, shouldn't the depth of the output be $K * 3$? Are they somehow 'averaged' or in other way combined with each other?

nbro
  • 42,615
  • 12
  • 119
  • 217
GKozinski
  • 1,290
  • 11
  • 22

1 Answers1

0

Answer to my question is that values obtained from convolutions among different channels sum up together, therefore 3 channels after convolution with one filter give one output.

Best explanation delivered by Andrew: https://www.coursera.org/lecture/convolutional-neural-networks/convolutions-over-volume-ctQZz

GKozinski
  • 1,290
  • 11
  • 22