For questions related to the family of models known as R-CNN (such as the original R-CNN model, fast R-CNN, faster R-CNN and mask R-CNN).
Questions tagged [r-cnn]
13 questions
6
votes
1 answer
How does the region proposal method work in Fast R-CNN?
I read so many articles and the Fast R-CNN paper, but I'm still confused about how the region proposal method works in Fast R-CNN.
As you can see in the image below, they say they used a proposal method, but it is not specified how it works.
What…
ozoubia
- 61
- 2
2
votes
1 answer
Is intersection of labels acceptable in computer vision?
I have a dataset, where objects are very close to each other. So, the question is: what is the best approach to label them?
There are two possible options:
mark objects so that they will not intersect (it is difficult, surroundings are not included…
Valery Noname
- 121
- 3
2
votes
1 answer
Is it possible to pre-train a CNN in a self-supervised way so that it can later be used to solve an instance segmentation task?
I would like to use self-supervised learning (SSL) to learn features from images (the dataset consists of similar images with small differences), then use the resulting trained model to bootstrap an instance segmentation task.
I am thinking about…
Timco Vanco
- 21
- 3
2
votes
0 answers
How is the data labelled in order to train a region proposal network?
I don't get how the training of the RPN works. From the forward propagation, I have $W \times H \times k$ outputs from the RPN.
How is the training data labeled such that I can use the loss function and update the weights through bach propagation? …
Abd El-Rahman Akram
- 21
- 1
2
votes
0 answers
Inaccurate masks with Mask-RCNN: Stairs effect and sudden stops
I've been using matterport's Mask R-CNN to train on a custom dataset. However, there seem to be some parameters that i failed to correctly define because on practically all of the images, the bottom or top of the object's mask is cut off:
As you…
Nawra C
- 43
- 3
2
votes
1 answer
Why are RNNs used in some computer vision problems?
I am learning computer vision. When I was going through implementations of various computer vision projects, some OCR problems used GRU or LSTM, while some did not. I understand that RNNs are used only in problems where input data is a sequence,…
Naveen Reddy Marthala
- 205
- 2
- 11
1
vote
0 answers
Why are the learned offsets of anchor boxes in the RCNN object detection models scale invariant?
In the original RCNN paper (https://arxiv.org/pdf/1311.2524.pdf) and continued in later RCNN papers such as faster RCNN (https://arxiv.org/pdf/1506.01497.pdf) the learned offsets of the anchor boxes are scale-invariant. For example the learned…
phil
- 143
- 5
1
vote
1 answer
What to do when the ROIs are smaller than $227 \times 227$ in R-CNN?
As English is not my native language, I have some hard time understanding the following sentence:
Regardless of the size or aspect ratio of the candidate region, we warp all pixels in a tight bounding box around it to the required size. Prior to…
Valentin
- 131
- 5
1
vote
0 answers
Does the selective search algorithm in object detection learn?
I am trying to get a better grasp of how object detection works. I (almost) completely understand the concept behind RPNs. However, I am a little bit confused with the selective search algorithm part. This algorithm does not really learn anything,…
Tibo Geysen
- 193
- 6
1
vote
1 answer
In Fast R-CNN, how are input RoIs mapped to the respective RoIs in the feature map before RoI pooling?
I've been reading the Fast R-CNN paper.
My understanding is that the input to one forward pass is the whole input image plus a list of RoIs (generated by selective search or another region proposal method). Then I understand that on the last…
Alexander Soare
- 1,379
- 3
- 12
- 28
1
vote
1 answer
In Faster R-CNN, how can I get the predicted bounding box given the neural network's output?
The RPN loss in Faster RCNN paper is
$$
L({p_i}, {t_i}) = \frac{1}{N_{cls}} \sum_{i} L_{cls}(p_i,p_i^*) + \lambda \frac{1}{N_{reg}} \sum_i p_i^* L_{reg}(t_i, t_i^*)
$$
For regression problems, we have the following parametrization
$$t_x=\frac{x -…
user31844
- 11
- 2
0
votes
1 answer
How are OCR training datasets constructed?
For the sake of concreteness: let's suppose that the word "OCR" refers to any OCR system build on an R-CNN architecture. Similarly, in aims of simplicity, let's declare that we are interested in reading digits between 0 and 100.
Question: How should…
Ramiro Hum-Sah
- 133
- 5
0
votes
1 answer
Darknet as a part of Yolo v3
I am pretty new to ML and my question may look strange. Especially the last part of it.
1)As far as I understand Darknet53 is an integral part of Yolo just as Resnet50 is a part of R-CNN Am I right?
2)On the other hand I understand that the R-CNN…
Igor
- 303
- 1
- 11