Object Detection Basics

143 阅读2分钟

How to do the object detection? An naive idea is,

  1. taking different regions of interest from the image
  2. using a CNN to classify the presence of the object in that region

Therefore, algorithms like R-CNN, YOLO etc have been developed to find these occurrences and find them fast.

R-CNN

Keywords: selective search; region proposals;

Procedures

  1. Region proposals: extract just 2000 regions from the image by the selective search algorithm (i.e. region proposals).
  2. Feature vectors: warp each region proposal into a square and fead into a CNN, producing a 4096-dimensional feature vector.
  3. Classify: feed SVM with each feature vector to classify the presence of the object within the candidate region proposal.

RCNN.png

Advantages

  • Bypass the problem of selecting a huge number of regions

Disadvantages

  • Huge training time: classifying 2000 region proposals per image
  • Non-realtime: taking around 47 seconds for each test image
  • No learning: selective search is a fixed algorithm

Fast R-CNN

Keywords: feature map; RoI pooling layer;

fast-rcnn.png

Procedures

  1. Feature map: instead of feeding the region proposals to the CNN, we feed the input image to the CNN to generate a convolutional feature map.
  2. Region proposals: using selective search to get region proposals.
  3. RoI pooling layer: from the convolutional feature map, we identify the region of proposals and warp them into squares and by using a RoI pooling layer we reshape them into a fixed size so that it can be fed into a fully connected layer. From the RoI feature vector, we use a softmax layer to predict the class of the proposed region and also the offset values for the bounding box.

Advantages

  • Fast R-CNN is faster than R-CNN because the convolution operation is done only once per image and a feature map is generated from it.

Disadvantages

  • Region proposals become bottlenecks in Fast R-CNN, affecting its performance.
  • Selective search is a slow and time-consuming process.

Comparison-of-object-detection-algorithms.png

Faster R-CNN

Keywords: region proposal network;

Faster-RCNN.png

Procedures

  1. Feature map: similar to Fast R-CNN, the image is provided as an input to a convolutional network which provides a convolutional feature map.
  2. Region proposals: a separate network is used to predict the region proposals, instead of using selectve search.
  3. RoI pooling layer: The predicted region proposals are then reshaped using a RoI pooling layer which is then used to classify the image within the proposed region and predict the offset values for the bounding boxes.

Advantages

  • Faster than fast-rcnn.

Disadvantages

  • Still of two-stage and based on the region proposals.

Comparison of test-time speed of object detection algorithms.png

YOLO

Keywords: Split image; Bbox probability; Spatial constraints;

yolo.png

Procedures

  1. Split the image: we take an image and split it into an SxS grid, within each of the grid we take m bounding boxes.
  2. BBox probability: for each of the bounding box, the network outputs a class probability and offset values for the bounding box.
  3. Locate objects: the bounding boxes having the class probability above a threshold value is selected and used to locate the object within the image.

Advantages

  • Orders of magnitude faster(45 frames per second) than the RCNNs.

Disadvantages

  • It struggles with small objects within the image, like a flock of birds.
  • Due to the spatial constraints of the algorithm.

Note: The following content comes from towards data science