Real-Time Object Detection代码分析COLORS产生边框颜色，由uniform随机产生0-255

1导入基本的库

1import numpy as np 
2import argparse  #命令行参数解析模块
3import time
4import cv2

2添加参数

1ap = argparse.ArgumentParser()
2ap.add_argument("-p", "--prototxt", required = True,  help = "path to Caffe 'deploy' prototxt file")
3ap.add_argument("-m", "--model", required = True, help = "path to Caffe pre-trained model")
4ap.add_argument("-c", "--probability", type = float, default = 0.2, help = "minimum probability to filter weak detections")
5args = vars(ap.parse_args())  #vars() 函数返回属性和属性值的字典对象，比如说调用args["prototxt"]就会返回传入的参数文件，除此之外，程序只能在命令行传入参数运行，有两种方法：.py --prototxt + 文件 或者 .py -p +文件

3准备

1CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
2"dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
3COLORS = np.random.uniform(0, 255, size = (len(CLASSES), 3))
4print("[INFO] loading model...")
5net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
6print("[INFO] starting video stream...")
7vs = cv2.VideoCapture(0)
8vs.set(300, 400)
9vs.set(300, 500)

CLASSES数组存放分类的物体标签

COLORS产生边框颜色，由uniform随机产生0-255RGB像素。

cv2.dnn.readNetFromCaffe()用于读取已经训练好的caffe模型。参数说明:prototxt表示caffe网络的结构文本，model表示已经训练好的参数结果。返回值:Net object

VideoCapture读取视频内容

4主循环

1while True:
2    suc, frame = vs.read()  #读取视频的下一帧，返回值为suc:是否成功读取  frame:视频的每一帧
3    (h, w) = frame.shape[:2]   #得到视频的width和height，返回的实际是shape[0]和shape[1]
4    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

blobFromImage是对图像进行预处理，以便在后面进行深度学习的分类。这个函数非常重要，其参数的设置决定了识别的效果
blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True，crop=False,ddepth = CV_32F )参数说明

image：传入的图像，这里用resize设置图像大小为300*300。

scalefactor：缩放因子，默认是1，与模型有很大关系，对识别准确度影响也非常大。

size：缩放后图像大小。

mean：要减去的均值，可以是R,G,B均值三元组，或者是一个值，每个通道都减这值，执行减均值，通道顺序是R、G、B。如果，输入图像通道顺序是B、G、R，那么请确保swapRB = True，交换通道。

swapRB：OpenCV认为图像通道顺序是B、R、G,而减均值时顺序是R、G、B，开启后就交换了其顺序。

crop：裁剪。

ddepth：输出blob的深度

总之，这里的每一个参数都比较重要，需要根据不同的模型进行细微的调整

1net.setInput(blob)
2detections = net.forward()

设置blob为网络的输入，然后进行前向传播。

 1    for i in np.arange(0, detections.shape[2]):
 2        probability = detections[0, 0, i, 2]
 3        if probability > args["probability"]:
 4            idx = int(detections[0, 0, i, 1])
 5            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
 6            (startX, startY, endX, endY) = box.astype("int")
 7            label = "{}: {:.2f}%".format(CLASSES[idx], probability * 100)
 8            cv2.rectangle(frame, (startX, startY), (endX, endY), COLORS[idx], 2)
 9            y = startY - 15 if startY - 15 > 15 else startY + 15
10            cv2.putText(frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

detection.shape[2]返回的应该是识别的个数？，所以i的取值是(0,100)。

detection[0, 0, i, 2]就是每次识别对应的准确率。

当准确率比最小的值(设置好的，在arg里面有)大的时候，就对识别的图像添加标签。

detection[0, 0, i, 1]对应的是识别到的物体的序号(ID)。

detection[0, 0, i, 3:7]* np.array([w, h, w, h])对应的是识别到物体的边框的位置坐标，将其转化为整型。

最后对应是将标签放到图像上面。

1    cv2.imshow("Frame", frame)
2    key = cv2.waitKey(1) & 0xFF
3    if key == ord("q"):
4        break
5cv2.destroyAllWindows()
6vs.stop()

收尾工作。