7

I have recently discovered asymmetric convolution layers in deep learning architectures, a concept which seems very similar to depthwise separable convolutions.

Are they really the same concept with different names? If not, where is the difference? To make it concrete, what would each one look like if applied to a 128x128 image with 3 input channels (say R,G,B) and 8 output channels?

NB: I cross-posted this from stackoverflow, since this kind of theoretical question is maybe better suited here. Hoping it is OK...

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

3

They are not the same thing.

asymmetric convolutions work by taking the x and y axes of the image separately. For example performing a convolution with an $(n \times 1)$ kernel before one with a $(1 \times n)$ kernel.

On the other-hand depth-wise separable convolutions separate the spatial and channel components of a 2D convolution. It will first perform the $(n \times n)$ convolution on each channel separately (full kernel shape will be $(n \times n \times 1)$ rather than $(n \times n \times k)$ where $k$ is the number of channels in the previous layer) before doing a $(1 \times 1)$ convolution to learn a relationship between the channels (full kernel size for that being $(1 \times 1 \times k)$)

mshlis
  • 2,399
  • 9
  • 23