Yolov5-5.0 and Yolov5-6.0搭建与改进Yolov5-5.0 and Yolov5-6.0搭建与改进

一、模型源码及预训练权重：

Yolov5-5.0:

github.com/ultralytics…

Yolov5-6.0:

github.com/ultralytics…

二、问题

1、Yolov5-6.0 Arial.ttf会自动下载，但是无法下载，因此报错。

解决办法：

在文件yolov5/utils/plots.py中，以下代码：


if RANK in (-1, 0):

        check_font()  # download TTF if necessary

    # YOLOv5 Annotator for train/val mosaics and jpgs and detect/hub inference annotations

    def __init__(self, im, line_width=None, font_size=None, font='Arial.ttf', pil=False, example='abc'):

改为：

class Annotator:

#if RANK in (-1, 0):

        #check_font()  # download TTF if necessary

    # YOLOv5 Annotator for train/val mosaics and jpgs and detect/hub inference annotations

    def __init__(self, im, line_width=None, font_size=None, font='', pil=False, example='abc'):

2、YOLOV5 | Run detect.py 报错AttributeError: ‘Upsample‘ object has no attribute ‘recompute_scale_factor‘。

将上述语句改为：

return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)

三、改进

1、增加soft-NMS

utils.general.py中增加如下代码：

def my_soft_nms(bboxes, scores, iou_thresh=0.5, sigma=0.5, score_threshold=0.25):
    bboxes = bboxes.contiguous()
    x1 = bboxes[:, 0]
    y1 = bboxes[:, 1]
    x2 = bboxes[:, 2]
    y2 = bboxes[:, 3]
    # 计算每个box的面积
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    # 首先对所有得分进行一次降序排列,仅此一次,以提高后续查找最大值速度. oeder为降序排列后的索引
    _, order = scores.sort(0, descending=True)
    # NMS后,保存留下来的边框
    keep = []
    while order.numel() > 0:
        if order.numel() == 1:  # 仅剩最后一个box的索引
            i = order.item()
            keep.append(i)
            break
        else:
            i = order[0].item()  # 保留首个得分最大的边框box索引,i为scores中实际坐标
            keep.append(i)
        # 巧妙使用tersor.clamp()函数求取order中当前框[0]之外每一个边框,与当前框[0]的最大值和最小值
        xx1 = x1[order[1:]].clamp(min=x1[i])
        yy1 = y1[order[1:]].clamp(min=y1[i])
        xx2 = x2[order[1:]].clamp(max=x2[i])
        yy2 = y2[order[1:]].clamp(max=y2[i])
        # 求取order中其他每一个边框与当前边框的交集面积
        inter = (xx2 - xx1).clamp(min=0) * (yy2 - yy1).clamp(min=0)
        # 计算order中其他每一个框与当前框的IoU
        iou = inter / (areas[i] + areas[order[1:]] - inter)  # 共order.numel()-1个
        idx = (iou > iou_thresh).nonzero().squeeze()  # 获取order中IoU大于阈值的其他边框的索引
        if idx.numel() > 0:
            iou = iou[idx]
            newScores = torch.exp(-torch.pow(iou, 2) / sigma)  # 计算边框的得分衰减
            scores[order[idx + 1]] *= newScores  # 更新那些IoU大于阈值的边框的得分
        newOrder = (scores[order[1:]] > score_threshold).nonzero().squeeze()
        if newOrder.numel() == 0:
            break
        else:
            newScores = scores[order[newOrder + 1]]
            maxScoreIndex = torch.argmax(newScores)

            if maxScoreIndex != 0:
                newOrder[[0, maxScoreIndex],] = newOrder[[maxScoreIndex, 0],]
            # 更新order.
            order = order[newOrder + 1]
    # 返回保留下来的所有边框的索引值,类型torch.LongTensor
    return torch.LongTensor(keep)

non_max_suppression中替换为如下内容：

i = my_soft_nms(boxes, scores, iou_thres)  # SOFT NMS

如下图所示：

图片1.png

2、增加SE注意力机制

（1）先把注意力结构代码放到common.py文件中，将这段代码粘贴到common.py文件中

class SE(nn.Module):
    def __init__(self, c1, r=16):
        super(SELayer, self).__init__()
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.l1 = nn.Linear(c1, c1 // r, bias=False)
        self.relu = nn.ReLU(inplace=True)
        self.l2 = nn.Linear(c1 // r, c1, bias=False)
        self.sig = nn.Sigmoid()

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avgpool(x).view(b, c)
        y = self.l1(y)
        y = self.relu(y)
        y = self.l2(y)
        y = self.sig(y)
        y = y.view(b, c, 1, 1)
        return x * y.expand_as(x)

（2）找到yolo.py文件里的parse_model函数，将类名加入进去

（3）修改配置文件（我这里拿yolov5s.yaml举例子），将注意力层加到你想加入的位置；常用的一般是添加到backbone的最后一层，或者C3里面，这里是加在了最后一层

当在网络中添加了新的层之后，那么该层网络后续的层的编号都会发生改变，看下图，原本Detect指定的是[ 17 , 20 , 23 ]层，所以在我们添加了SE注意力层之后也要Detect对这里进行修改，即原来的17层变成了18 层；原来的20层变成了21 层；原来的23层变成了24 层；所以Detecet的from系数要改为[ 18 , 21 , 24 ]

同样的，Concat的from系数也要修改，这样才能保持原网络结构不发生特别大的改变，我们刚才把SE层加到了第9层，所以第9层之后的编号都会加1，这里我们要把后面两个Concat的from系数分别由[ − 1 , 14 ] , [ − 1 , 10 ]改为[ − 1 , 15 ] , [ − 1 , 11 ]

3、增加计数

图片2.png 将detect.py中的这段代码改为：

图片3.png

图片4.png 最后在if save_img：中增加如下代码:

图片5.png

增加的完整代码如下：

person_count = 0
tie_count = 0
person_color = (0, 0, 0)
tie_color = (0, 0, 0)

# Write results+Count
# count=0
for *xyxy, conf, cls in reversed(det):
    if save_txt:  # Write to file
        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
        with open(txt_path + '.txt', 'a') as f:
            f.write(('%g ' * len(line)).rstrip() % line + '\n')
    if save_img or view_img:  # Add bbox to image
        label = f'{names[int(cls)]} {conf:.2f}'
        plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)
        ##########################分类计数(coco.yaml)##########################
        # names中person对应的数组下标
        if int(cls) == 0:
            person_count += 1
            person_color=colors[int(cls)]
        # names中tie对应的数组下标
        if int(cls) == 27:
            tie_count += 1
            tie_color = colors[int(cls)]
        # count = count+1
        
text = 'person_num: %d ' % (person_count)
cv2.putText(im0, text, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, person_color, 2)
text = 'tie_num: %d ' % (tie_count)
cv2.putText(im0, text, (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, tie_color, 2)