3

I've searched for an answer to this, and read several scientific articles on the subject, but I can't find a practical explanation of how Dropout actually drops nodes in an algorithm.

I've read that Dropout zeros out the activation function for particular nodes, which makes sense in the forward pass. But how does this work for the backward pass?

Robin van Hoorn
  • 2,780
  • 2
  • 12
  • 33
Connor
  • 133
  • 1
  • 5

1 Answers1

4

Backpropagation on a network with dropout works just as it does normally, it calculates the gradients and updates the weights.

Longer explanation

Dropout is a regularization technique which drops nodes in the forward pass.

In the backward pass, the 'influence' of every weight on the end result is calculated (the gradient). If a node was dropped by the Dropout layer, then its influence of the outgoing weights is therefore also 0 (as 0 * weight = 0). In short, backpropagation works just as it always does.

So how does it learn, if those nodes are never updated? What you need to understand is that in every forward pass, the nodes which are dropped are random. And hence, in every forward pass, different nodes are dropped. As a result, in every backward pass, different nodes are updated.

Because nodes are dropped, the network cannot rely solely on specific nodes to make predictions, and hence the network (should) generalize better. But I think you already understood that part.

For an even more elaborate explanation.

Robin van Hoorn
  • 2,780
  • 2
  • 12
  • 33