【目标检测(八)】一文吃透目标检测回归框损失函数——IoU、GIoU、DIoU、CIoU原理及Python代码

2,993 阅读8分钟

【目标检测(一)】RCNN详解——深度学习目标检测的开山之作
【目标检测(二)】SPP Net——让卷积计算共享
【目标检测(三)】Fast RCNN——让RCNN模型能端到端训练
【目标检测(四)】Faster RCNN——RPN网络替代selective search
【目标检测(五)】YOLOv1——开启one-stage目标检测的篇章
【目标检测(六)】YOLOv2——引入anchor, Better, Faster, Stronger
【目标检测(七)】YOLOv3——准确率的全面提升
【目标检测(八)】一文吃透目标检测回归框损失函数——IoU、GIoU、DIoU、CIoU原理及Python代码
【目标检测(九)】FPN详解——通过特征金字塔网络进行多尺度特征融合
【目标检测(十)】RetinaNet详解——Focal Loss将one-stage算法推向巅峰
【目标检测(十一)】CenterNet——Anchor Free, NMS Free
【目标检测(十二)】FCOS——用实例分割的思想做Anchor Free的目标检测

目标检测包括两类任务,一类是分类任务,即对识别到的目标进行分类,另一类是bbox的回归任务,即localization定位,需要对预测的边界框进行损失回归。本文主要详细阐述当前主流目标检测算法中回归框损失函数的设计思想和相关代码实现,包括L2 Loss、Smooth L1 Loss、IoU Loss、GIoU Loss、DIoU Loss和CIoU Loss。

1. Smooth L1 Loss

在Faster RCNN中,对预测边界框的偏移量offset进行回归,其中偏移量offset定义如下所示:

论文中使用Smooth L1 Loss对上述offset进行回归,计算方法如下所示:

Smooth L1是L1和L2的结合体,综合了L1和L2的优点,在靠近0的一定区间内采用L2 Loss,而在该区间之外采用L1 Loss,可能看了下面的图像会有更深的理解:

使用Smooth L1的好处:

  • L1 Loss的缺点是在0处不可导,且导致训练后期,预测值与 ground truth 差异很小时,L1 损失对预测值的导数的绝对值仍然为 1,而 learning rate 如果不变,损失函数将在稳定值附近波动,难以继续收敛以达到更高精度。
  • L2 Loss的缺点是当x很大时,产生的loss也很大,容易造成训练不稳定。
  • Smooth L1优点是当预测框与 ground truth 差别过大时,梯度值不至于过大,对outlier更加稳定,避免了梯度爆炸;当预测框与 ground truth 差别很小时,梯度值足够小。

带sigma的Smooth L1 Loss版本: image.png

代码实现:

def smooth_l1_loss(x, target, beta, reduce=True, normalizer=1.0):
    diff = torch.abs(x - target)
    loss = torch.where(diff < 1 / beta, 0.5 * beta * (diff ** 2), diff - 0.5 / beta)

    if reduce:
        return torch.sum(loss) / normalizer
    else:
        return torch.sum(loss, dim=1) / normalizer

上述实现与直接调用pytorch的torch.nn.SmoothL1Loss结果一致,torch中默认beta=1

2. L2 Loss

在YOLOv1-YOLOv3系列中,论文原作者采用L2误差平和和的方法计算误差,以YOLOv3为例,对(x, y, w, h)的offset偏差进行回归,如下图所示:

{σ(txp)=bxCx,σ(typ)=byCytwp=log(wpwa),thp=log(hpha)txg=gxfloor(gx),tyg=gyfloor(gy)twg=log(wgwa),thg=log(hgha)\begin{cases} σ(t_x^p) = b_x - C_x, σ(t_y^p) = b_y - C_y\\ t_w^p = log(\frac{w_p}{w_a'}), t_h^p = log(\frac{h_p}{h_a'})\\ t_x^g = g_x - floor(g_x), t_y^g = g_y - floor(g_y)\\ t_w^g = log(\frac{w_g}{w_a'}), t_h^g = log(\frac{h_g}{h_a'}) \end{cases}

L1、L2、Smooth L1作为目标检测回归Loss的缺点:

  • 坐标分别计算x、y、w、h的损失,当成4个不同的对象处理。bbox的4个部分应该是作为一个整体讨论,但是被独立看待了。
  • 对尺度敏感,不同预测效果的预测框和真实框可能产生相同的loss。

3. IOU Loss

3.1 IOU Loss原理

IOU Loss是旷视在UnitBox中提出的边界框的一种损失函数计算方法,L1 、 L2以及Smooth L1 Loss 是将 bbox 四个点分别求 loss 然后相加,并没有考虑坐标之间的相关性。如下图所示,黑色的框为真实框,而绿色的框为预测框,显然第三个框的预测效果更好,但是这三个框具有相同的L2 loss,显然是不合理的。

IOU Loss将 4 个点构成的 bbox 看成一个整体进行回归,设计思想如下:

算法流程如下:

对于那些预测框确实为真实物体来说,计算预测框和真实框的相交部分面积和组合面积,进行相除并取-log,计算即可得到IOU Loss。当预测框与真实框重合程度越高的时候,loss越趋近于0,反之loss越大,这样的Loss函数设计是合理的。

3.2 IOU Loss代码实现

代码实现如下:

def iou_loss(pred, target, reduction='mean', eps=1e-6):
    """
    preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    reduction: "mean" or "sum"
    return: loss
    """
    # 求pred, target面积
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    # 求pred, target相交面积
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0, min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0, min=0.)
    inter_areas = inter_widths * inter_heights

    # 求iou
    ious = torch.clamp(inter_areas / (pred_areas + target_areas - inter_areas), min=eps)
    if reduction == 'mean':
        loss = torch.mean(-torch.log(ious))
    elif reduction == 'sum':
        loss = torch.sum(-torch.log(ious))
    else:
        raise NotImplementedError

    return loss

3.3 IOU Loss优缺点分析

优点:

  • IOU Loss能反映预测框和真实框的拟合效果。
  • IOU Loss具有尺度不变性,对尺度不敏感。

缺点:

  • 无法衡量完全不相交的两个框所产生的的损失(iou固定为0)。
  • 两个不同形状的预测框可能产生相同的loss(相同的iou)。

4. GIOU Loss

4.1 GIOU Loss原理

GIOU的设计初衷就是想解决IOU Loss存在的问题(预测框与真实框不相交时iou恒定为0),设计了一套Generalized Intersection over Union Loss。在IOU的基础上,GIOU还需要找到预测框和真实框的最小外接矩形,然后求出最小外接矩形减去两个预测框union的面积(下图中紫色反斜线区域的面积),定义GIOU为IOU与刚算出的面积之差。

定义GIOU Loss = 1 - GIOU,注意到GIOU范围在[-1, 1],那么GIOU Loss的范围在[0, 2]。整个GIOU Loss的算法流程如下图所示:

image.png

4.2 GIOU Loss代码实现

如果看得有点懵,直接看代码吧,过程并不复杂。

def giou_loss(pred, target, reduction='mean', eps=1e-6):
    """
    preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    reduction: "mean" or "sum"
    return: loss
    """
    # 求pred, target面积
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    # 求pred, target相交面积
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0, min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0, min=0.)
    inter_areas = inter_widths * inter_heights

    # 求iou
    unions = pred_areas + target_areas - inter_areas
    ious = torch.clamp(inter_areas / unions, min=eps)

    # 求最小外接矩形
    outer_xmins = torch.minimum(pred[:, 0], target[:, 0])
    outer_ymins = torch.minimum(pred[:, 1], target[:, 1])
    outer_xmaxs = torch.maximum(pred[:, 2], target[:, 2])
    outer_ymaxs = torch.maximum(pred[:, 3], target[:, 3])
    outer_widths = (outer_xmaxs - outer_xmins + 1).clamp(0.)
    outer_heights = (outer_ymaxs - outer_ymins + 1).clamp(0.)
    outer_areas = outer_heights * outer_widths

    gious = ious - (outer_areas - unions) / outer_areas
    gious = gious.clamp(min=-1.0, max=1.0)
    if reduction == 'mean':
        loss = torch.mean(1 - gious)
    elif reduction == 'sum':
        loss = torch.sum(1 - gious)
    else:
        raise NotImplementedError
    return loss

4.3 GIOU Loss优缺点分析

优点:

  • GIOU Loss解决了IOU Loss在不相交情况的问题,在目标检测任务中能够得到更高的准确率。 缺点:
  • 无法衡量有包含关系时的框回归损失,如下图,三个回归框具有相同的GIOU Loss,但是显然第三个框的回归效果更好。

5. DIOU Loss

5.1 DIOU Loss原理

为了解决GIOU Loss中无法衡量两个完全包含关系的框的loss的问题,DIOU Loss加入了两个框中心的距离到损失函数中,用框中心距离的平方/最小外界矩形的对角线(红线长度/蓝线长度)来替代GIOU Loss中的面积比。

DIOU计算公式:

DIOU Loss计算公式:

5.2 DIOU Loss代码实现

def diou_loss(pred, target, reduce='mean', eps=1e-6):
    """
    preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    reduction: "mean" or "sum"
    return: loss
    """
    # 求pred, target面积
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    # 求pred, target相交面积
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0, min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0, min=0.)
    inter_areas = inter_widths * inter_heights

    # 求iou
    unions = pred_areas + target_areas - inter_areas + eps
    ious = torch.clamp(inter_areas / unions, min=eps)

    # 求最小外接矩形对角线距离
    outer_xmins = torch.minimum(pred[:, 0], target[:, 0])
    outer_ymins = torch.minimum(pred[:, 1], target[:, 1])
    outer_xmaxs = torch.maximum(pred[:, 2], target[:, 2])
    outer_ymaxs = torch.maximum(pred[:, 3], target[:, 3])
    outer_diag = torch.clamp((outer_xmaxs - outer_xmins + 1.), min=0.) ** 2 + \
        torch.clamp((outer_ymaxs - outer_ymins + 1.), min=0.) ** 2 + eps

    # 求pred与target框的中心距离
    c_pred = ((pred[:, 0] + pred[:, 2]) / 2, (pred[:, 1] + pred[:, 3]) / 2)
    c_target = ((target[:, 0] + target[:, 2]) / 2, (target[:, 1] + target[:, 3]) / 2)
    distance = (c_pred[0] - c_target[0] + 1.) ** 2 + (c_pred[1] - c_target[1] + 1.) ** 2

    # 求diou loss
    dious = ious - distance / outer_diag
    if reduce == 'mean':
        loss = torch.mean(1 - dious)
    elif reduce == 'sum':
        loss = torch.sum(1 - dious)
    else:
        raise NotImplementedError

    return loss

5.3 DIOU Loss优缺点分析

优点:

  • DIOU Loss解决了GIOU Loss在完全包含关系情况下无法衡量损失的问题,在目标检测任务中能够进一步得到更高的准确率。 缺点:
  • 无法衡量两个包含关系的框中心点接近、面积相同但是形状不同的损失,如下图,两个框中心点重合,左右两个预测红框面积相同但是形状不同,两者DIOU Loss相同,但是显然后者拟合效果更好。

6. CIOU Loss

6.1 CIOU Loss原理

CIOU Loss和DIOU Loss是在同一篇文章中提出的,在DIOU Loss的基础上,CIOU Loss增加考虑了预测框的形状(长宽比)与真实框是否一致,是对DIOU Loss非常好的补充。

注意到新增的αv中,IOU越大,分母就会越小,α越大,即长宽比的比重越高。通过这样的方式,把重叠面积、中心距离和框形状都整合在一个loss函数里。

6.2 CIOU Loss代码实现

def ciou_loss(pred, target, reduce='mean', eps=1e-6):
    """
    preds:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    bbox:[[x1,y1,x2,y2], [x1,y1,x2,y2],,,]
    reduction: "mean" or "sum"
    return: loss
    """
    # 求pred, target面积
    pred_widths = (pred[:, 2] - pred[:, 0] + 1.).clamp(0)
    pred_heights = (pred[:, 3] - pred[:, 1] + 1.).clamp(0)
    target_widths = (target[:, 2] - target[:, 0] + 1.).clamp(0)
    target_heights = (target[:, 3] - target[:, 1] + 1.).clamp(0)
    pred_areas = pred_widths * pred_heights
    target_areas = target_widths * target_heights

    # 求pred, target相交面积
    inter_xmins = torch.maximum(pred[:, 0], target[:, 0])
    inter_ymins = torch.maximum(pred[:, 1], target[:, 1])
    inter_xmaxs = torch.minimum(pred[:, 2], target[:, 2])
    inter_ymaxs = torch.minimum(pred[:, 3], target[:, 3])
    inter_widths = torch.clamp(inter_xmaxs - inter_xmins + 1.0, min=0.)
    inter_heights = torch.clamp(inter_ymaxs - inter_ymins + 1.0, min=0.)
    inter_areas = inter_widths * inter_heights

    # 求iou
    unions = pred_areas + target_areas - inter_areas + eps
    ious = torch.clamp(inter_areas / unions, min=eps)

    # 求最小外接矩形对角线距离
    outer_xmins = torch.minimum(pred[:, 0], target[:, 0])
    outer_ymins = torch.minimum(pred[:, 1], target[:, 1])
    outer_xmaxs = torch.maximum(pred[:, 2], target[:, 2])
    outer_ymaxs = torch.maximum(pred[:, 3], target[:, 3])
    outer_diag = torch.clamp((outer_xmaxs - outer_xmins + 1.), min=0.) ** 2 + \
        torch.clamp((outer_ymaxs - outer_ymins + 1.), min=0.) ** 2 + eps

    # 求pred与target框的中心距离
    c_pred = ((pred[:, 0] + pred[:, 2]) / 2, (pred[:, 1] + pred[:, 3]) / 2)
    c_target = ((target[:, 0] + target[:, 2]) / 2, (target[:, 1] + target[:, 3]) / 2)
    distance = (c_pred[0] - c_target[0] + 1.) ** 2 + (c_pred[1] - c_target[1] + 1.) ** 2

    # 求预测框形状上的损失
    w_pred, h_pred = pred[:, 2] - pred[:, 0], pred[:, 3] - pred[:, 1] + eps
    w_target, h_target = target[:, 2] - target[:, 0], target[:, 3] - target[:, 1] + eps
    factor = 4 / (math.pi ** 2)
    v = factor * torch.pow(torch.atan(w_pred / h_pred) - torch.atan(w_target / h_target), 2)
    alpha = v / (1 - ious + v)

    # 求ciou loss
    cious = ious - distance / outer_diag - alpha * v
    if reduce == 'mean':
        loss = torch.mean(1 - cious)
    elif reduce == 'sum':
        loss = torch.sum(1 - cious)
    else:
        raise NotImplementedError

    return loss

7. 目标检测框回归损失函数总结及效果

一个优秀的定位损失应该考虑如下3个因素:

  • 重叠面积
  • 中心距离
  • 长宽比

如下图所示,损失函数在YOLOv3上的性能效果,可以观察到IOU Loss、GIOU Loss、DIOU Loss和CIOU Loss依次都有一定的准确率提升效果:

Reference:

  1. arxiv.org/pdf/1608.01…
  2. giou.stanford.edu/GIoU.pdf
  3. arxiv.org/pdf/1911.08…