30

What are "bottlenecks" in the context of neural networks?

This term is mentioned, for example, in this TensorFlow article, which also uses the term "bottleneck values". How does one calculate bottleneck values? How do these values help image classification?

Please explain in simple words.

nbro
  • 42,615
  • 12
  • 119
  • 217
Anurag Singh
  • 411
  • 1
  • 5
  • 11

2 Answers2

39

The bottleneck in a neural network is just a layer with fewer neurons than the layer below or above it. Having such a layer encourages the network to compress feature representations (of salient features for the target variable) to best fit in the available space. Improvements to compression occur due to the goal of reducing the cost function, as for all weight updates.

In a CNN (such as Google's Inception network), bottleneck layers are added to reduce the number of feature maps (aka channels) in the network, which, otherwise, tend to increase in each layer. This is achieved by using 1x1 convolutions with fewer output channels than input channels.

You don't usually calculate weights for bottleneck layers directly, the training process handles that, as for all other weights. Selecting a good size for a bottleneck layer is something you have to guess, and then experiment, in order to find network architectures that work well. The goal here is usually finding a network that generalises well to new images, and bottleneck layers help by reducing the number of parameters in the network whilst still allowing it to be deep and represent many feature maps.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66
11

Imagine, you want to re-compute the last layer of a pre-trained model :

Input->[Freezed-Layers]->[Last-Layer-To-Re-Compute]->Output

To train [Last-Layer-To-Re-Compute], you need to evaluate outputs of [Freezed-Layers] multiple times for a given input data. In order to save time, you can compute these ouputs only once.

Input#1->[Freezed-Layers]->Bottleneck-Features-Of-Input#1

Then, you store all Bottleneck-Features-Of-Input#i and directly use them to train [Last-Layer-To-Re-Compute].

Explanations from the "cache_bottlenecks" function of the "image_retraining" example :

Because we're likely to read the same image multiple times (if there are no distortions applied during training) it can speed things up a lot if we calculate the bottleneck layer values once for each image during preprocessing, and then just read those cached values repeatedly during training.

JC R
  • 211
  • 2
  • 4