AI Studio项目链接：aistudio.baidu.com/aistudio/pr…

一. 项目背景介绍

近年来，小碗菜越来越受大众欢迎，主要是因为其具有以下三个优点：

食物多样化，营养更全面
速度快，价格实惠
照顾个人口味的同时也能保证卫生安全

在这里插入图片描述然而，如今市场上在结算价格时常常采用的是人工结算的方法，不仅效率低下，还常常容易算错，尤其是在高峰期，更是增加了收银员的负担。因此，我想制作一个自动结算系统，利用计算机视觉技术，能够自动识别出顾客挑选了哪些美食，并自动结算价格。有利于提高餐厅在用餐高峰期时的运转效率，非常适合在学校、公司等人员密集的场所使用。

二. 数据介绍

2.1 数据集的准备

要实现目标检测任务常常需要依赖于庞大的数据集，我打算通过爬虫的方法来获取数据集。以番茄炒蛋为例，我们在百度图片中输入“番茄炒蛋”，可以看到搜索结果如下所示：

在这里插入图片描述可以看到搜索出来的结果正是我们平时吃的番茄炒蛋的样子，因此我们可以通过爬虫来爬取该网页上的图片，这样我们就获得了番茄炒蛋的数据集。

爬虫源码如下所示(运行程序，按照提示一步一步输入即可)：

# -*- coding: utf-8 -*-
import re
import requests
from urllib import error
from bs4 import BeautifulSoup
import os

num = 0
numPicture = 0
file = ''
List = []


def Find(url, A):
    global List
    print('正在检测图片总数，请稍等.....')
    t = 0
    i = 1
    s = 0
    while t < 200: # 200可以修改，只要比自己想要的张数大就行
        Url = url + str(t)
        try:
            Result = A.get(Url, timeout=7, allow_redirects=False)
        except BaseException:
            t = t + 60
            continue
        else:
            result = Result.text
            pic_url = re.findall('"objURL":"(.*?)",', result, re.S)  # 先利用正则表达式找到图片url
            s += len(pic_url)
            if len(pic_url) == 0:
                break
            else:
                List.append(pic_url)
                t = t + 60
    return s

def dowmloadPicture(html, keyword):
    global num
    # t =0
    pic_url = re.findall('"objURL":"(.*?)",', html, re.S)  # 先利用正则表达式找到图片url
    print('找到关键词:' + keyword + '的图片，即将开始下载图片...')
    for each in pic_url:
        print('正在下载第' + str(num + 1) + '张图片，图片地址:' + str(each))
        try:
            if each is not None:
                pic = requests.get(each, timeout=7)
            else:
                continue
        except BaseException:
            print('错误，当前图片无法下载')
            continue
        else:
            if len(pic.content) < 200:
                continue
            string = file + r'\\'  + str(num) + '.jpg'
            fp = open(string, 'wb')
            fp.write(pic.content)
            fp.close()
            num += 1
        if num >= numPicture:
            return


if __name__ == '__main__':  # 主函数入口
    headers = {
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
        'Upgrade-Insecure-Requests': '1'
    }

    A = requests.Session()
    A.headers = headers

    word = input("请输入搜索关键词(可以是人名，地名等): ")
    # add = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=%E5%BC%A0%E5%A4%A9%E7%88%B1&pn=120'
    url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=' + word + '&pn='

    tot = Find(url, A)
    print('经过检测%s类图片共有%d张' % (word, tot))
    numPicture = int(input('请输入想要下载的图片数量: '))
    file = input('请建立一个存储图片的文件夹，输入文件夹名称即可: ')
    y = os.path.exists(file)
    if y == 1:
        print('该文件已存在，请重新输入')
        file = input('请建立一个存储图片的文件夹，)输入文件夹名称即可: ')
        os.mkdir(file)
    else:
        os.mkdir(file)
    t = 0
    tmp = url
    while t < numPicture:
        try:
            url = tmp + str(t)
            result = A.get(url, timeout=10, allow_redirects=False)

        except error.HTTPError as e:
            print('网络错误，请调整网络后重试')
            t = t + 60
        else:
            dowmloadPicture(result.text, word)
            t = t + 60
    print('当前爬取结束，感谢使用')

运行后，我们即可获得番茄炒蛋的数据集：

在这里插入图片描述

2.2 数据集的标注

数据标注我采用的是labelimg进行人工标注，labelimg安装包链接：pan.baidu.com/s/1znNYWpyY… 提取码：kvsl

以番茄炒蛋数据集为例，按照以下四个步骤完成对一类数据集的标注：

打开番茄炒蛋图片数据集所在文件夹
打开保存标注信息的文件夹
在图片上框选区域并标注类别
切换到下一张图片

在这里插入图片描述这样我们就实现了对一类数据集的标注，保存的标注信息.xml文件内容如下所示：

<annotation>
	<folder>番茄炒蛋</folder>
	<filename>0.jpg</filename>
	<path>C:\Users\Administrator\Desktop\中国家常美食\番茄炒蛋\0.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>1068</width>
		<height>600</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>fanqiechaodan</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>91</xmin>
			<ymin>108</ymin>
			<xmax>921</xmax>
			<ymax>559</ymax>
		</bndbox>
	</object>
</annotation>

用同样的方法，我们就可以获取我们想要美食的数据集。

2.3 将数据转换为VOC格式

这个项目我准备了300张图片(白切鸡、番茄炒蛋、宫爆鸡丁各100张)和对应的.xml文件，下面我们需要按照要求将数据调整为VOC格式以便接下来训练。VOC数据格式如下所示：

---VOC
    ------create_list.py
    --------label_list.txt
    --------train.txt
    --------val.txt
    ------Annotations
    	 ---------n个xml文件
    ------ImagesSet
         ---------Main
              --------get_list.py
              --------label_list.txt
              --------train.txt
              --------val.txt
    ------JPEGImages
    	 ---------n个img文件

其中get_list.py用于将数据集按照比例随机分成训练集和验证集，生成train.txt和val.txt两个文件(Main文件夹下)，如下所示，其中数字代表对应的图片名：

在这里插入图片描述

# get_list.py
#路径需要自己修改
import os
import random

train_precent=0.8 #划分比例
xml="C:/Users/Administrator/Desktop/dataset/Annotations"
save="C:/Users/Administrator/Desktop/dataset/ImagesSet/Main"
total_xml=os.listdir(xml)

num=len(total_xml)
list=range(num)
tr=int(num*train_precent)
train=random.sample(list,tr)

ftrain=open("C:/Users/Administrator/Desktop/dataset/ImagesSet/Main/train.txt","w")
ftest=open("C:/Users/Administrator/Desktop/dataset/ImagesSet/Main/val.txt","w")

for i in range(num):
    name=total_xml[i][:-4]+"\n"
    if i in train:
        ftrain.write(name)
    else:
        ftest.write(name)

ftrain.close()
ftest.close()

create_list.py用于将图片路径及其对应.xml文件路径一一对应起来，生成train.txt和val.txt两个文件(主文件夹下)，如下所示：

在这里插入图片描述

# creat_list.py
import os
import os.path as osp
import re
import random

devkit_dir = './'

# 获取文件夹路径
def get_dir(devkit_dir,  type):
    return osp.join(devkit_dir, type)

# 将图片及对应.xml文件地址一一对应并写入列表中
def walk_dir(devkit_dir):
    filelist_dir = get_dir(devkit_dir, 'ImagesSet/Main')
    annotation_dir = get_dir(devkit_dir, 'Annotations')
    img_dir = get_dir(devkit_dir, 'JPEGImages')
    trainval_list = []
    test_list = []
    added = set()

    for _, _, files in os.walk(filelist_dir):
        for fname in files:
            img_ann_list = []
            if re.match('train\.txt', fname):
                img_ann_list = trainval_list
            elif re.match('val\.txt', fname):
                img_ann_list = test_list
            else:
                continue
            fpath = osp.join(filelist_dir, fname)
            for line in open(fpath):
                name_prefix = line.strip().split()[0]
                if name_prefix in added:
                    continue
                added.add(name_prefix)
                ann_path = osp.join(annotation_dir, name_prefix + '.xml')
                img_path = osp.join(img_dir, name_prefix + '.jpg')
                assert os.path.isfile(ann_path), 'file %s not found.' % ann_path
                assert os.path.isfile(img_path), 'file %s not found.' % img_path
                img_ann_list.append((img_path, ann_path))

    return trainval_list, test_list

# 将列表信息存入.txt文件
def prepare_filelist(devkit_dir, output_dir):
    trainval_list = []
    test_list = []
    trainval, test = walk_dir(devkit_dir)
    trainval_list.extend(trainval)
    test_list.extend(test)
    random.shuffle(trainval_list)
    with open(osp.join(output_dir, 'train.txt'), 'w') as ftrainval:
        for item in trainval_list:
            ftrainval.write(item[0] + ' ' + item[1] + '\n')

    with open(osp.join(output_dir, 'val.txt'), 'w') as ftest:
        for item in test_list:
            ftest.write(item[0] + ' ' + item[1] + '\n')


if __name__ == '__main__':
    prepare_filelist(devkit_dir, '.')

label_list.txt里面是数据集包含的种类列表：baiqieji、fanqiechaodan、gongbaojiding

在这里插入图片描述我们将文件按照前面的格式放置，就完成了VOC数据集的准备：

在这里插入图片描述

2.4 数据集的可视化

将我们准备好的VOC格式的数据集上传至AI Studio并解压。

# 解压数据集
!unzip -oq /home/aistudio/data/data128743/dataset.zip

下面我们来查看一下数据集的目录结构。

!tree dataset -d

dataset
├── Annotations
├── ImagesSet
│   └── Main
└── JPEGImages

4 directories

分别从三个类中挑选一张图片展示一下。

# 数据可视化
import os
import matplotlib.pyplot as plt
import cv2
#matplotlib inline

image_path = 'dataset/JPEGImages'
image_path_list = sorted(os.listdir(image_path))
image_path_list = [os.path.join(image_path, path) for path in image_path_list]

sample_image_path_list = ['dataset/JPEGImages/0.jpg','dataset/JPEGImages/101.jpg', 'dataset/JPEGImages/201.jpg']
sample_label_list = ['baiqieji','fanqiechaodan','gongbaojiding']

plt.figure(figsize=(8, 8))

for i in range(len(sample_image_path_list)):
    plt.subplot(1,len(sample_image_path_list), i+1)
    plt.title(sample_label_list[i])
    plt.imshow(cv2.imread(sample_image_path_list[i])[:, :, ::-1])

plt.tight_layout()
plt.show()
print("图片总共有{}张".format(len(image_path_list)))

在这里插入图片描述

三. 模型介绍

3.1 安装PaddleDetection

# 克隆PaddleDetection仓库
!git clone https://gitee.com/paddlepaddle/PaddleDetection.git

cd PaddleDetection/

# 安装其他依赖
!pip install -r requirements.txt
# 编译安装paddledet
!python setup.py install

3.2 修改配置文件

打开PaddleDetection/configs/datasets/voc.yml,修改训练集、验证集和测试集路径如下所示：

在这里插入图片描述

3.3. 选择模型

在PaddleDetection/configs目录下可以看到PaddleDetection提供了非常多的模型，这次我选用的是ppyolov2模型(ppyolov2_r50vd_dcn_voc.yml)。

在这里插入图片描述 ppyolo是PaddleDetection优化和改进的YOLOv3的模型，相较20年发布的PP-YOLO，v2版本在COCO 2017 test-dev上的精度提升了3.6个百分点，由45.9%提升到了49.5%；在640*640的输入尺寸下，FPS达到68.9FPS。 PP-YOLOv2在同等速度下，精度超越YOLOv5！

在这里插入图片描述 PP-YOLO模型库：

Model	GPU number	images/GPU	backbone	input shape	Box AP^val	Box AP^test	V100 FP32(FPS)	V100 TensorRT FP16(FPS)	download	config
PP-YOLO	8	24	ResNet50vd	608	44.8	45.2	72.9	155.6	model	config
PP-YOLO	8	24	ResNet50vd	512	43.9	44.4	89.9	188.4	model	config
PP-YOLO	8	24	ResNet50vd	416	42.1	42.5	109.1	215.4	model	config
PP-YOLO	8	24	ResNet50vd	320	38.9	39.3	132.2	242.2	model	config
PP-YOLO_2x	8	24	ResNet50vd	608	45.3	45.9	72.9	155.6	model	config
PP-YOLO_2x	8	24	ResNet50vd	512	44.4	45.0	89.9	188.4	model	config
PP-YOLO_2x	8	24	ResNet50vd	416	42.7	43.2	109.1	215.4	model	config
PP-YOLO_2x	8	24	ResNet50vd	320	39.5	40.1	132.2	242.2	model	config
PP-YOLO	4	32	ResNet18vd	512	29.2	29.5	357.1	657.9	model	config
PP-YOLO	4	32	ResNet18vd	416	28.6	28.9	409.8	719.4	model	config
PP-YOLO	4	32	ResNet18vd	320	26.2	26.4	480.7	763.4	model	config
PP-YOLOv2	8	12	ResNet50vd	640	49.1	49.5	68.9	106.5	model	config
PP-YOLOv2	8	12	ResNet101vd	640	49.7	50.3	49.5	87.0	model	config

关于PP-YOLO更详细的信息可以参考README文件(PaddleDetection/configs/ppyolo/README_cn.md)

四. 模型训练

lr_schedule：带有warm-up的PiecewiseDecay，即lr从0开始逐渐上升至base_lr，然后从base_lr开始分段阶梯衰减。
optimize: Momentun(动量梯度下降)，即梯度下降的时候会增加一个向前的冲量，防止陷入局部最优。
epoch：583
batch_size：12
Loss function：Iou_loss(交并比)

模型训练，修改ppyolov2_r50vd_dcn_voc.yml中的snapshot_epoch为20，即每20个epoch保存一次模型。

!python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_voc.yml --use_vdl=true --vdl_log_dir=vdl_dir/scalar \

训练完成后，模型将会保存在PaddleDetection/output路径下。

五. 模型评估

这里我提供了我训练了340个epoch后的模型供大家测试用，大家也可以使用训练完的模型测试。

!unzip -oq /home/aistudio/data/data129584/339.zip

!python -u PaddleDetection/tools/eval.py -c PaddleDetection/configs/ppyolo/ppyolov2_r50vd_dcn_voc.yml  -o weights=339/339

在这里插入图片描述 mAP是目标检测的评价指标，可以看到mAP达到了97.23%。

下面我们输入一张图片让模型预测一下，看看效果如何。

!python -u PaddleDetection/tools/infer.py -c PaddleDetection/configs/ppyolo/ppyolov2_r50vd_dcn_voc.yml  -o weights=339/339 --infer_img=test.jpg --output_dir=infer_output/test_output

在这里插入图片描述

import os
import matplotlib.pyplot as plt
import cv2
#matplotlib inline

sample_image_path_list = ['test.jpg','infer_output/test_output/test.jpg']
sample_label_list = ['before','after']

plt.figure(figsize=(15, 15))

for i in range(len(sample_image_path_list)):
    plt.subplot(len(sample_image_path_list),1, i+1)
    plt.title(sample_label_list[i])
    plt.imshow(cv2.imread(sample_image_path_list[i])[:, :, ::-1])

plt.tight_layout()
plt.show()

在这里插入图片描述

六. 总结与升华

本项目基于PaddleDetection实现了一个简单目标检测任务，可以作为其他目标检测的demo，只需要更换不同的数据集即可。

针对这次项目，我有如下几点改进和想法：

由于人工标注的困难，这次只标注了三类食物共100张图片，这对于目标检测任务来说数据集还是有点少的，后续可以增加食物的种类和图片的数量，以满足食堂自助结账系统的需求。
这次项目只完成了餐厅自助结账系统的目标检测部分，如果想要真正落地还需要模型的部署，这是这个项目最大的难点。
这个项目真正部署后，还需要具备能够添加食物种类的功能，图片数据集可以通过爬虫获取也可以使用者自行拍照后上传，但是数据的标注也是一大难点。

【AI达人创造营第二期】餐厅自助结账系统之目标检测部分

一. 项目背景介绍

二. 数据介绍

2.1 数据集的准备

2.2 数据集的标注

2.3 将数据转换为VOC格式

2.4 数据集的可视化

三. 模型介绍

3.1 安装PaddleDetection

3.2 修改配置文件

3.3. 选择模型

四. 模型训练

五. 模型评估

六. 总结与升华