Object Detection BasicsTherefore, algorithms like R-CNN, YOL

How to do the object detection? An naive idea is,

Therefore, algorithms like R-CNN, YOLO etc have been developed to find these occurrences and find them fast.

R-CNN

Keywords: selective search; region proposals;

Procedures

Region proposals: extract just 2000 regions from the image by the selective search algorithm (i.e. region proposals).
Feature vectors: warp each region proposal into a square and fead into a CNN, producing a 4096-dimensional feature vector.
Classify: feed SVM with each feature vector to classify the presence of the object within the candidate region proposal.

Advantages

Disadvantages

Keywords: feature map; RoI pooling layer;

Procedures

Feature map: instead of feeding the region proposals to the CNN, we feed the input image to the CNN to generate a convolutional feature map.
Region proposals: using selective search to get region proposals.
RoI pooling layer: from the convolutional feature map, we identify the region of proposals and warp them into squares and by using a RoI pooling layer we reshape them into a fixed size so that it can be fed into a fully connected layer. From the RoI feature vector, we use a softmax layer to predict the class of the proposed region and also the offset values for the bounding box.

Advantages

Fast R-CNN is faster than R-CNN because the convolution operation is done only once per image and a feature map is generated from it.

Disadvantages

Keywords: region proposal network;

Procedures

Feature map: similar to Fast R-CNN, the image is provided as an input to a convolutional network which provides a convolutional feature map.
Region proposals: a separate network is used to predict the region proposals, instead of using selectve search.
RoI pooling layer: The predicted region proposals are then reshaped using a RoI pooling layer which is then used to classify the image within the proposed region and predict the offset values for the bounding boxes.

Advantages

Disadvantages

Comparison of test-time speed of object detection algorithms.png

Keywords: Split image; Bbox probability; Spatial constraints;

Procedures

Split the image: we take an image and split it into an SxS grid, within each of the grid we take m bounding boxes.
BBox probability: for each of the bounding box, the network outputs a class probability and offset values for the bounding box.
Locate objects: the bounding boxes having the class probability above a threshold value is selected and used to locate the object within the image.

Advantages

Disadvantages

Note: The following content comes from towards data science