Questions tagged [bounding-box]

For questions related to the concept of a bounding box in object detection or other computer vision tasks.

17 questions
7
votes
2 answers

What's the role of bounding boxes in object detection?

I'm quite new to the field of computer vision and was wondering what are the purposes of having the boundary boxes in object detection. Obviously, it shows where the detected object is, and using a classifier can only classify one object per image,…
5
votes
4 answers

Can bounding boxes further improve the performance of a CNN classifier?

Suppose I have a standard image classification problem (i.e. CNN is shown a single image and predicts a single classification for it). If I were to use bounding boxes to surround the target image (i.e. convert this into an object detection problem),…
3
votes
1 answer

YOLO - are the anchor boxes used only in training?

another question in YOLO. I've red about how YOLO adjusts anchor boxes by offsets to create the final bounding boxes. What I do not understand, is when YOLO does it. Is it being done only during the training process, or also during the common use of…
Igor
  • 303
  • 1
  • 11
2
votes
2 answers

How to architect a network to find bounding boxes in simple images?

I have an application where I want to find the locations of objects on a simple, relatively constant background (fixed camera angle, etc). For investigative purposes, I've created a test dataset that displays many characteristics of the actual…
2
votes
1 answer

How are IOUs for ground truth boxes in YOLO calculated?

I know how IOU works during detection. However, while preparing targets from ground-truth for training, how is the IOU between a given object and all anchor boxes calculated? Is the ground truth bounding box aligned with an anchor box such that they…
2
votes
2 answers

How does YOLO detect the object when the object is in multiple grid cells?

I have been reading various articles and watching videos on YouTube, but I can't seem to understand one thing. How does YOLO make a bounding box for an object if it is in multiple grid cells? For example, in the picture given below, how does it…
2
votes
1 answer

How does a bounding box detection network "know" about absolute position?

I've always found bounding box regression a bit weird. There's no positional encoding like in vision transformers, so how does the network "know" the absolute position when producing bounding box coordinates? It gets even weirder when we are dealing…
Alexander Soare
  • 1,379
  • 3
  • 12
  • 28
1
vote
1 answer

Can the output of non-max suppression have more bounding boxes than the number of objects the picture actually has?

I am not really understanding the non-max supression (NMS) algorithm. Let's say my model produces 20 bounding boxes (bbs) for my picture which have 7 cats (7 objects with the same class). Can it be possible that after performing NMS on 20 bbs; the…
1
vote
0 answers

How do transformers compare to CNNs in terms of compute budget (and computing time) during inference?

Transformers are data and GPU hungry during training. Is this also true at inference time? How do transformers compare to feedforward CNNs e.g., during bounding box generation at inference time? I haven't found a good comparison of computing time…
1
vote
1 answer

Is there a state-of-the-art deep learning paper that uses center point regression instead of bounding box regression, for object tracking?

Almost all deep learning based object tracking methods perform bounding box regression. Siamese-based networks which are very popular for object tracking also perform bounding box regression most of the time, although SiamFC type exceptions exist.…
1
vote
1 answer

Why is it called "area of union" when calculating the Intersection over Union?

When calculating the Intersection Over Union the following explanation is widely used. (Source: A Survey on Performance Metrics for Object-Detection Algorithms, by Padilla et al. 2020) The image and name suggest that the denominator (the area of…
1
vote
0 answers

Different equations for Yolov3 in courses/ articles and Darknet GitHub code?

I am confused by the equations for bounding boxes I find online. Some articles say that box_width = anchor_width * exp(residual_value_of_box_width)) and the coordinates have a sigmoid function. Eg:…
1
vote
1 answer

Why do the object detection networks produce multiple anchor boxes per location?

In various neural network detection pipelines, the detection works as follows: One processes the input image through the pretrained backbone Some additional convolutional layers The detection head, where each pixel on the given feauture map…
0
votes
0 answers

How do anchor boxes improve prediction in YOLO?

I've been trying to implement a YOLO-like model for object detection. I came up with the following approach: Images (B, 3, 224, 224) are fed to a possibly pretrained ResNet backbone Each cell of the resulting feature map of shape (B, 512, 7, 7) is…
0
votes
0 answers

How to normalize bounding box sizes in perspective transform for objects at different distances from the camera

I’m working on an object detection system and I'm new to this field. Here i'm talking with respect to camera point of view. When a object is detected which is far from the camera, it appears small and the bounding box is small and when the small…
1
2