0

In practical applications, we generally talk about three types of convolution layers: 1-dimensional convolution, 2-dimensional convolution, and 3-dimensional convolution. Most popular packages like PyTorch, Keras, etc., provide Conv1d, Conv2d, and Conv3d.

What is the deciding factor for the dimensionality of the convolution layer mentioned in the packages?

nbro
  • 42,615
  • 12
  • 119
  • 217
hanugm
  • 4,102
  • 3
  • 29
  • 63

2 Answers2

1

Kernel dimensionality and presence of filters decides the dimension of convolution operator. N Dimensional Convolutions have N dimensional kernels. For example, from Keras Documentation on 2 Dimensional Convolutions:

kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

If you have more than one filter in the layer, that also adds another dimension to the layer. So, we can say that a 2D convolutional layer is in general 3 dimensional, where 3rd dimension is the number of filters: (k,k,F). For the special case of a single filter, F=1 and we can treat it as 2 dimensional.

serali
  • 900
  • 7
  • 17
1

The dimensionality used to discuss convolutional layers in CNNs is based on the dimensionality of the input without considering channels.

  • 1D CNNs might process raw audio sources (mono or stereo), text sequences, IR spectrometry from a single sample point
  • 2D CNNs can process photographic images (regardless of colour/depth etc information), audio spectrograms, grid-based board games
  • 3D CNNs can process voxels from Minecraft, image sequences from videos etc

It is often possible to perform signal processing that changes dimensions of signal sources. Whether that adds "channels" or adds a dimension can be a matter of convenience to fit a particular approach. In terms of defining a n-dimensional array, then the addition of channels is just another dimension. In terms of considering signal processing performed in CNNs, we care about the distinction between channels and the rest of the space that the signal exists in.

One way to decide whether something is considered a channel or a CNN layer dimension is whether there is an ordering or metric that consistently separates measurements over that dimension. If a metric such as space, time or frequency applies, then that dimension can be considered part of the "core" dimensionality that defines the problem, whilst a more arbitrary set of features (e.g. each entry in the vector embedding of a word) is more channel-like.

As standard CNN design involves summing over all input channels to create each output feature/channel, which is mathematically the same as increasing the convolution dimension (when the kernel size in that dimension matches to the number of channels), then in practice the convolution operation implemented in a CNN layer of a particular dimensionality can be one dimension size higher. E.g. a layer class labelled "Conv1D" will perform a 2D convolution operation, with the added dimension size matching exactly to the number of input channels. However, conceptually it makes sense to view this as a sum of lower-dimension convolutions, because of the need to exactly match the dimension size. This extra dimension is seen as a convenience for calculation, and not part of the definition.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66