6

I read so many articles and the Fast R-CNN paper, but I'm still confused about how the region proposal method works in Fast R-CNN.

As you can see in the image below, they say they used a proposal method, but it is not specified how it works.

What confuses me is, for example, in the VGGnet, the output of the last convolution layer is feature maps of shape 14x14x512, but what is the used algorithm to propose the regions and how does it propose them from the feature maps?

enter image description here

nbro
  • 42,615
  • 12
  • 119
  • 217
ozoubia
  • 61
  • 2

1 Answers1

4

Yes, it is not specified because the region proposal algorithm did not change from R-CNN (the previous version from Fast R-CNN, however, in the next verion, Faster R-CNN, this algorithm is replaced by a CNN).

The region proposal algorithm you are looking for is called selective search. You can find in the R-CNN paper that the algorithm is described in "Selective Search for Object Recognition", I found a copy here.

The algorithm is based on a series of segmentation and aggregation techniques of the input image for generating the proposed regions. Check it out 4 iterations of segmentation & aggregation over the same input image to build the proposed regions.

enter image description here

All the algorithm is doing is just iterating over 4 steps:

  1. Initial regions based on segmentation by pixel light intensity are obtained by applying a segmentation algorithm described in the paper. For example, given a picture of a shepherd with his sheep in the mountain, it is segmented by light intensity, and the image of Figure (a) is obtained.
  2. Different regions are proposed based on the previous segmentation, Figure (e)
  3. The similarity between the proposed regions is calculated using the formula proposed in equation 6 in Section 3.2 in the paper which is nothing more than an aggregate metric of the similarity of two regions based in 4 metrics: similarity in color, texture, size and fill (measures how well a region within another)
  4. Add the regions based on similarity and get Figure (b). Then return to step two.

That is how iteratively you get all the images depicted.

JVGD
  • 1,198
  • 1
  • 8
  • 15