2

I have constructed a CNN that utilizes max-pooling layers. I have found with these layers that, should I remove them, my network performs ideally with every output and gradient at each layer having a variance close to 1. However, if they are included, the variance skyrockets.

This makes sense, of course, as a max-pooling layer takes the maximum of an area, which must incur a positive bias as larger numbers are chosen.

I would just like to know what methods are typically used to combat this.

Pluviophile
  • 1,293
  • 7
  • 20
  • 40
Recessive
  • 1,446
  • 10
  • 21

0 Answers0