即插即用涨点系列 (六)：AAAI 2025，PConv & SD Loss 详解！融合风车卷积与动态损失的涨点新范式。

🔥 AI 即插即用 | 你的CV涨点模块“军火库”已开源！🔥

大家好！为了方便大家在CV科研和项目中高效涨点，我创建并维护了一个即插即用模块的GitHub代码仓库。

仓库里不仅有：

核心模块即插即用代码
论文精读总结
架构图深度解析
全文逐句翻译与应用实例

更有海量SOTA模型的创新模块汇总，致力于打造一个“AI即插即用”的百宝箱，方便大家快速实验、组合创新！

🚀 GitHub 仓库链接：github.com/AITricks/AI…

觉得有帮助的话，欢迎大家 Star, Fork, PR 一键三连，共同维护！

即插即用涨点系列 (六)：AAAI 2025，PConv & SD Loss 详解！融合风车卷积与动态损失的涨点新范式。

论文原文 (Paper)：arxiv.org/pdf/2412.16… 官方代码 (Code)：github.com/JN-Yang/PCo…

论文精读：PConv-SDLoss

1. 核心思想

本文针对红外小目标检测（IRSTDS）任务，提出了两个核心创新：**PConv（风车形卷积）**和 SD Loss（尺度动态损失）。
PConv 是一种新型的即插即用卷积模块，其设计灵感来源于红外小目标（IRST）在 3D 灰度图上呈现的高斯空间分布。它通过不对称填充（asymmetry padding）和十字交叉的卷积核（ $1 \times 3$ 和 $3 \times 1$ ）来模拟这种“中心亮、四周暗”的“风车形”特征，从而以极小的参数代价换取了巨大的感受野和更强的特征提取能力。
SD Loss 是一种新型损失函数，它动态地调整“尺度损失”（Sloss）和“位置损失”（Lloss）的权重。它能根据目标尺寸（Area）自适应地调整惩罚侧重：对小目标（IoU 易突变）降低尺度损失（Sloss）的权重，更关注位置（Lloss）；对大目标则相反，从而解决了传统 IoU-based 损失对小目标尺度不敏感和标签波动大的问题。

2. 背景与动机

[文本角度总结] 基于 CNN 的红外小目标检测（IRSTDS）虽然取得了巨大进展，但仍面临两大瓶颈：
1. 卷积核的“设计缺陷”： 现有的 CNN 方法普遍使用标准卷积（如 3x3 方形卷积）。这种“一刀切”的设计忽视了红外小目标本身的物理特性。作者观察到（如图 1），IRST 在灰度 3D 视图中呈现出高斯分布（中心尖锐，向外扩散）。标准方形卷积核无法有效匹配这种中心集中的高斯形态，导致特征提取能力不佳。
2. 损失函数的“尺度缺陷”： 现有的损失函数（无论是 BBox 用的 CIoU 还是 Mask 用的 SLS Loss）虽然结合了尺度（IoU/Scale）和位置（Location）损失，但它们对所有尺度的目标都**“一视同仁”。然而，由于标签的主观性和目标暗淡（如图 2），小目标的 IoU 极易发生剧烈波动（例如，1 个像素的偏差可能导致 IoU 从 0.5 降到 0）。现有损失函数没有考虑这种尺度敏感性**，导致模型在回归小目标时性能受限。
本文的动机：1) 设计一种新型卷积核（PConv），使其结构更贴合 IRST 的高斯空间特性。2) 设计一种新型损失函数（SD Loss），使其能够根据目标尺度动态调整对尺度和位置的关注度，以提高对小目标的检测鲁棒性。
动机图解分析（Figure 1, 2, 3）：
- 图表 A (Figure 1)：揭示“高斯分布”这一物理特性
  - “看图说话”： 这张图展示了两个红外小目标的 2D 图像（上）和对应的 3D 灰度值（下）。
  - 分析： 无论是背景相对干净（左图）还是背景杂乱（右图），小目标在 3D 视图中都呈现出中心尖锐、向四周快速衰减的形态，这正是高斯分布的典型特征。
  - 结论（“语义鸿沟”）： 这揭示了标准 3x3 方形卷积的“语义鸿沟”——用一个均匀的方形核去匹配一个尖锐的高斯峰，效率低下且不符合物理直觉。这直接催生了本文设计 PConv（风车形卷积）的动机。
- 图表 B (Figure 2)：揭示“标签波动”这一数据缺陷
  - “看图说话”： 这张图展示了人工标注 BBox 和 Mask 时存在的“主观性”和“波动性”。
  - 分析： 即使是同一个目标，标注的 Mask（底部三个小图）和 BBox（顶部绿色和红色框）也存在明显差异（例如 5x4 vs 7x4）。
  - 结论（“效率瓶颈”）： 这种标签噪声导致了 IoU（尺度损失 Sloss）的剧烈波动（高达 86%）。如果损失函数不考虑这一点，盲目地惩罚 IoU，会导致训练不稳定。这催生了本文设计 SD Loss 的动机，即必须降低 IoU 损失在小目标上的权重。
- 图表 C (Figure 3)：PConv 的“风车形”设计
  - “看图说话”： 这张图是 PConv 的核心结构图。
  - 分析： PConv 的关键在于四个并行的卷积分支。这四个分支通过不对称填充（Padding）（例如，分支 1 用 Padding(1,0,0,3)）和矩形卷积核（Conv(c', (1,3)) 或 Conv(c', (3,1))）的组合，巧妙地实现了从四个方向（上、下、左、右）向中心汇聚的特征提取模式。
  - 结论（“风车形”）： 最终，这四个分支的输出被 Cat（拼接）起来，并通过一个 $2 \times 2$ 卷积（注意： $k=2, s=1$ ）进行融合。这种“十字交叉”再融合的结构，在感受野上（右上角图示）形成了中心权重最高（4次操作）、四周权重递减（3, 2, 1次）的效果。这完美地模拟了 Figure 1 所示的高斯分布，因此它比标准卷积更适合提取 IRST 特征。

3. 主要贡献点

提出 PConv (风车形卷积)： 针对红外小目标（IRST）的高斯空间分布特性，设计了一种即插即用的 PConv 模块。它通过并行的、不对称填充的矩形卷积（ $1 \times 3$ 和 $3 \times 1$ ）来模拟“风车形”感受野，实现了中心高、四周低的类高斯加权效果。
PConv 的高效性： 相比标准 3x3 卷积，PConv (k=3) 在参数量减少 22.2% 的同时，将感受野扩大了 177%（从 9 提升到 25），实现了极高的效率。
提出 SD Loss (尺度动态损失)：
- 针对 BBox 和 Mask 标签在小目标上 IoU 波动剧烈的问题，提出了 SD Loss。
- 核心机制（Figure 5）： 该损失函数包含一个基于目标面积（Area）的动态系数 $\beta$ 。
- SDB Loss (用于 BBox)： 对于 BBox，当目标越小时，降低尺度损失（ $\mathcal{L}_{BS}$ ）的权重，提高位置损失（ $\mathcal{L}_{BL}$ ）的权重。
- SDM Loss (用于 Mask)： 对于 Mask，当目标越小时，提高尺度损失（ $\mathcal{L}_{MS}$ ）的权重，降低位置损失（ $\mathcal{L}_{ML}$ ）的权重（因为 Mask 的位置损失不稳定）。
构建 SIRST-UAVB 数据集： 针对现有数据集规模小、场景简单的问题，本文构建并发布了一个最大、最具挑战性的真实场景红外小目标数据集 SIRST-UAVB，包含了复杂的背景和暗弱的无人机/鸟类目标。

4. 方法细节

整体网络架构：
- 本文没有提出新的整体网络架构。
- PConv 和 SD Loss 是作为**“即插即用”的组件**，被应用（Plug-and-play）到现有的 SOTA 网络（如 YOLOv8n-p2, MSHNet, DNANet, ISNet）中，以提升它们的性能。
- PConv 的部署： PConv 被用来替换骨干网络（Backbone）**浅层（lower layers）**的标准卷积（Conv）层（例如，替换 YOLOv8n-p2 的前两层卷积）。
- SD Loss 的部署： SD Loss (SDB 或 SDM) 被用来替换网络原始的损失函数（如 CIoU 或 SLS Loss）。
核心创新模块详解（Figure 3 & 5）：
- 对于模块 A：PConv (风车形卷积)
  - 理念： 模拟 IRST 的高斯空间分布（中心权重高，四周低），并高效扩大感受野。
  - 内部结构：
    1. 输入： 特征图 $X$ （ $h_1 \times w_1 \times c_1$ ）。
    2. 并行分支（核心）： $X$ 被并行送入四个分支。每个分支都使用不同的不对称填充（Padding）和矩形卷积核：
      - 分支 1 (上)： Padding(1,0,0,3)（上3，左1） + Conv(c', (1,3))（ $1 \times 3$ 核）
      - 分支 2 (右)： Padding(0,3,0,1)（右3，下1） + Conv(c', (3,1))（ $3 \times 1$ 核）
      - 分支 3 (下)： Padding(0,1,3,0)（下3，右1） + Conv(c', (1,3))（ $1 \times 3$ 核）
      - 分支 4 (左)： Padding(3,0,1,0)（左3，上1） + Conv(c', (3,1))（ $3 \times 1$ 核）
    3. 拼接 (Cat)： 四个分支的输出特征图（ $X_1$ 到 $X_4$ ）在通道维度上进行拼接（Cat），得到一个 $h' \times w' \times 4c'$ 的特征图。
    4. 融合 (Fusion)： 使用一个**Conv(c_2, (2,2), 1, 0)**（即 $2 \times 2$ 卷积，步幅 1，无填充）对拼接后的特征图进行卷积。
    5. 输出： 得到最终输出 $Y$ （ $h_2 \times w_2 \times c_2$ ）。
  - 设计目的：
    - 风车形： 四个分支的不对称填充和矩形卷积核，在空间上构成了“风车叶片”向外扩散的结构。
    - 类高斯加权： 如图 3 右上角的“Receptive field”所示，当 $2 \times 2$ 的融合核滑过时，中心的像素点（4）被所有四个分支的 $2 \times 2$ 区域覆盖，因此被计算了 4 次；而外围的像素点（3, 2, 1）被覆盖的次数递减。这巧妙地实现了中心加权的类高斯效应。
    - 高效率： 这种设计（ $k=3$ ）的感受野达到了 25 ( $5 \times 5$ )，但参数量（ $7c_1^2$ ）却比标准 $3 \times 3$ 卷积（ $9c_1^2$ ）更少。
- 对于模块 B：SD Loss (尺度动态损失)
  - 理念： 解决小目标 IoU 波动大（标签噪声）和尺度/位置敏感度不一致的问题。
  - 机制 (SDB Loss for BBox)：
    1. 计算 $\beta_B$ (公式 12)： 首先根据目标 $B_{gt}$ 的面积（Area）计算一个基础权重 $\beta_B$ 。该权重与面积成正比，但上限为 $\delta$ （例如 $\delta=0.5$ ）。
    2. 计算 Sloss/Lloss 权重 (公式 14)：
      - $\beta_{\mathcal{L}_{BS}} = 1 - \delta + \beta_B$ （尺度损失权重）
      - $\beta_{\mathcal{L}_{BL}} = 1 + \delta - \beta_B$ （位置损失权重）
    3. 分析 (Figure 5a)： 当目标面积 $B_{gt} \rightarrow 0$ 时， $\beta_B \rightarrow 0$ 。此时， $\beta_{\mathcal{L}_{BS}} \rightarrow 1-\delta$ (权重变小)，而 $\beta_{\mathcal{L}_{BL}} \rightarrow 1+\delta$ (权重变大)。
    4. 结论： SDB Loss 自动地对小目标的尺度损失（Sloss）降权，同时对位置损失（Lloss）增权，以应对 IoU 波动大的问题。
  - 机制 (SDM Loss for Mask)：
    1. 计算 $\beta_M$ (公式 13)： 逻辑同上。
    2. 计算 Sloss/Lloss 权重 (公式 16)：
      - $\beta_{\mathcal{L}_{MS}} = 1 + \beta_M$ （尺度损失权重）
      - $\beta_{\mathcal{L}_{ML}} = 1 - \beta_M$ （位置损失权重）
    3. 分析 (Figure 5b)： 当目标面积 $M_{gt} \rightarrow 0$ 时， $\beta_M \rightarrow 0$ 。此时， $\beta_{\mathcal{L}_{MS}} \rightarrow 1$ (权重变大)，而 $\beta_{\mathcal{L}_{ML}} \rightarrow 1$ (权重变小)。
    4. 结论： SDM Loss 的策略与 SDB 相反。因为作者认为 Mask 标签的位置损失 $\mathcal{L}_{ML}$ 本身不稳定（它计算的是所有像素的平均位置），所以对小目标的位置损失（Lloss）降权，同时对尺度损失（Sloss）增权。
图解总结：
- Figure 1 揭示了问题 1：IRST 具有高斯分布，标准方形卷积不匹配。
- Figure 2 揭示了问题 2：IRST 标签（尤其是小目标）存在剧烈的 IoU 波动，标准 Loss 不适用。
- Figure 3 提供了解决方案 1：PConv 模块。它通过“风车形”的不对称矩形卷积和中心融合设计，完美地模拟了高斯感受野（中心权重高、四周低），且比标准卷积更高效。
- Figure 5 提供了解决方案 2：SD Loss。它设计了一个动态权重 $\beta$ ，使损失函数能自适应于目标尺度，解决了小目标 IoU 波动大的问题。
- Figure 4, 6, 7 提供了结果验证：将 PConv 和 SD Loss 这两个即插即用模块应用到 SOTA 网络（如 YOLOv8, MSHNet）上，能显著减少漏检（紫色圈）和误报（黄色圈），提升检测和分割性能。

5. 即插即用模块的作用

本文的两个核心创新 PConv 和 SD Loss 都是**即插即用（Plug-and-play）**的组件。
PConv (风车形卷积)：
- 作用： 这是一个卷积层模块，可作为 nn.Conv2d（标准卷积）的直接替代品，尤其适用于替换骨干网络的浅层（底层）。
- 适用场景：
  1. 红外小目标检测 (IRSTDS)： 这是本文的原始应用。PConv 的类高斯感受野设计使其极度擅长从复杂背景中提取 IRST 的“中心高亮”特征（如图 4 所示，PConv 能增强目标并抑制背景）。
  2. 任何“类高斯”特征的提取： 适用于其他任何具有“中心尖锐、四周模糊”特征的任务，例如天文图像中的恒星检测、医学影像中的微钙化点检测，或任何需要大感受野和高效率的通用骨干网络。
- 优势： 在参数量更少（-22.2%）的情况下，提供远超标准 3x3 卷积的感受野（+177%）。
SD Loss (尺度动态损失)：
- 作用： 这是一个损失函数，可作为标准 BBox 损失（如 CIoU, GIoU）或 Mask 损失（如 Dice, SLS Loss）的直接替代品。
- 适用场景：
  1. 小目标检测/分割： 这是其核心适用场景。当数据集中包含大量小目标时，SD Loss 通过其尺度动态机制，可以显著提高模型对小目标的回归稳定性和检测精度。
  2. 标签噪声（IoU 波动大）的任务： 适用于任何标签（尤其是 BBox）存在较大主观性或波动性的数据集。SDB Loss 通过降低对小目标 Sloss 的依赖，使模型对这种标签噪声更加鲁棒。
- 优势： 无需修改网络架构，仅替换损失函数即可根据目标尺度动态调整训练策略，提升模型对不同尺度目标的检测/分割平衡性。

6.即插即用模块

"""
即插即用模块：APConv (Asymmetric Padding Convolution) 和 Loss 函数
包含完整的卷积模块和损失函数，可以直接导入使用
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import math
from skimage import measure


# ==================== APConv 模块 ====================

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""

    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))


class Bottleneck(nn.Module):
    """Standard bottleneck module."""
    
    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and expansion."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        """Forward pass through bottleneck."""
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))


class PConv(nn.Module):  
    """Pinwheel-shaped Convolution using the Asymmetric Padding method."""
    
    def __init__(self, c1, c2, k, s):
        super().__init__()
        p = [(k, 0, 1, 0), (0, k, 0, 1), (0, 1, k, 0), (1, 0, 0, k)]
        self.pad = [nn.ZeroPad2d(padding=(p[g])) for g in range(4)]
        self.cw = Conv(c1, c2 // 4, (1, k), s=s, p=0)
        self.ch = Conv(c1, c2 // 4, (k, 1), s=s, p=0)
        self.cat = Conv(c2, c2, 2, s=1, p=0)

    def forward(self, x):
        yw0 = self.cw(self.pad[0](x))
        yw1 = self.cw(self.pad[1](x))
        yh0 = self.ch(self.pad[2](x))
        yh1 = self.ch(self.pad[3](x))
        return self.cat(torch.cat([yw0, yw1, yh0, yh1], dim=1))


class APBottleneck(nn.Module):
    """Asymmetric Padding bottleneck."""

    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and expansion."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        p = [(2,0,2,0),(0,2,0,2),(0,2,2,0),(2,0,0,2)]
        self.pad = [nn.ZeroPad2d(padding=(p[g])) for g in range(4)]
        self.cv1 = Conv(c1, c_ // 4, k[0], 1, p=0)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        """Forward pass through APBottleneck."""
        return x + self.cv2((torch.cat([self.cv1(self.pad[g](x)) for g in range(4)], 1))) if self.add else self.cv2((torch.cat([self.cv1(self.pad[g](x)) for g in range(4)], 1)))


class APC2f(nn.Module):
    """Faster Implementation of APCSP Bottleneck with Asymmetric Padding convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, P=True, g=1, e=0.5):
        """Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups, expansion."""
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        if P:
            self.m = nn.ModuleList(APBottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
        else:
            self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        """Forward pass through APC2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))


# ==================== Loss 模块 ====================

def bbox2dist(anchor_points, target_bboxes, reg_max):
    """
    Convert bbox to distance distribution.
    
    Args:
        anchor_points: Anchor points tensor
        target_bboxes: Target bounding boxes tensor
        reg_max: Maximum regression value
        
    Returns:
        Distance distribution tensor
    """
    x1y1 = anchor_points - target_bboxes[..., :2]
    x2y2 = target_bboxes[..., 2:] - anchor_points
    return torch.cat([x1y1, x2y2], -1).clamp_(0, reg_max - 0.01)


def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, SDIoU=True, eps=1e-7, delta=0.5):
    """
    Calculate Intersection over Union (IoU) of box1(1, 4) to box2(n, 4).

    Args:
        box1 (torch.Tensor): A tensor representing a single bounding box with shape (1, 4).
        box2 (torch.Tensor): A tensor representing n bounding boxes with shape (n, 4).
        xywh (bool, optional): If True, input boxes are in (x, y, w, h) format. If False, input boxes are in
                               (x1, y1, x2, y2) format. Defaults to True.
        GIoU (bool, optional): If True, calculate Generalized IoU. Defaults to False.
        DIoU (bool, optional): If True, calculate Distance IoU. Defaults to False.
        CIoU (bool, optional): If True, calculate Complete IoU. Defaults to False.
        SDIoU (bool, optional): If True, calculate Scale-based Dynamic IoU. Defaults to False.
        eps (float, optional): A small value to avoid division by zero. Defaults to 1e-7.

    Returns:
        (torch.Tensor): IoU, GIoU, DIoU, or CIoU values depending on the specified flags.
    """

    # Get the coordinates of bounding boxes
    if xywh:  # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
    else:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
        w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps

    # Intersection area    ∩
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * (
        b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)
    ).clamp_(0)

    # Union Area      U
    union = w1 * h1 + w2 * h2 - inter + eps

    # IoU
    iou = inter / union

    # R_oc = 1     # The YOLO bounding box is normalized, so R_oc is equal to 1.
    if CIoU or DIoU or GIoU or SDIoU:
        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width
        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height
        if CIoU or DIoU or SDIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = cw**2 + ch**2 + eps  # convex diagonal squared
            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2
            if CIoU or SDIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi**2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                if SDIoU:
                    beta = (w2 * h2 * delta) / 81
                    beta = torch.where(beta > delta, torch.tensor(delta, device=beta.device), beta)
                    return delta-beta + (1-delta+beta)*(iou-v*alpha) - (1+delta-beta)*(rho2/c2)  # SDIoU
                return iou - (rho2 / c2 + v * alpha)  # CIoU
            return iou - rho2 / c2  # DIoU
        c_area = cw * ch + eps  # convex area
        return iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf
    return iou  # IoU


class BboxLoss(nn.Module):
    """Criterion class for computing training losses during training."""

    def __init__(self, reg_max, use_dfl=False):
        """Initialize the BboxLoss module with regularization maximum and DFL settings."""
        super().__init__()
        self.reg_max = reg_max
        self.use_dfl = use_dfl

    def forward(self, pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask):
        """IoU loss."""
        weight = target_scores.sum(-1)[fg_mask].unsqueeze(-1)
        iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, SDIoU=True, delta=0.5)  
        loss = ((1.0 - iou) * weight).sum() / target_scores_sum    #  SDB loss

        # DFL loss
        if self.use_dfl:
            target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)
            loss_dfl = self._df_loss(pred_dist[fg_mask].view(-1, self.reg_max + 1), target_ltrb[fg_mask]) * weight
            loss_dfl = loss_dfl.sum() / target_scores_sum
        else:
            loss_dfl = torch.tensor(0.0).to(pred_dist.device)

        return loss, loss_dfl

    @staticmethod
    def _df_loss(pred_dist, target):
        """
        Return sum of left and right DFL losses.

        Distribution Focal Loss (DFL) proposed in Generalized Focal Loss
        https://ieeexplore.ieee.org/document/9792391
        """
        tl = target.long()  # target left
        tr = tl + 1  # target right
        wl = tr - target  # weight left
        wr = 1 - wl  # weight right
        return (
            F.cross_entropy(pred_dist, tl.view(-1), reduction="none").view(tl.shape) * wl
            + F.cross_entropy(pred_dist, tr.view(-1), reduction="none").view(tl.shape) * wr
        ).mean(-1, keepdim=True)


def SoftIoULoss(pred, target):
    pred = torch.sigmoid(pred)

    smooth = 1

    intersection = pred * target
    intersection_sum = torch.sum(intersection, dim=(1, 2, 3))
    pred_sum = torch.sum(pred, dim=(1, 2, 3))
    target_sum = torch.sum(target, dim=(1, 2, 3))

    loss = (intersection_sum + smooth) / \
           (pred_sum + target_sum - intersection_sum + smooth)

    loss = 1 - loss.mean()

    return loss


def Dice(pred, target, warm_epoch=1, epoch=1, layer=0):
    pred = torch.sigmoid(pred)

    smooth = 1

    intersection = pred * target
    intersection_sum = torch.sum(intersection, dim=(1, 2, 3))
    pred_sum = torch.sum(pred, dim=(1, 2, 3))
    target_sum = torch.sum(target, dim=(1, 2, 3))

    loss = (2 * intersection_sum + smooth) / \
           (pred_sum + target_sum + intersection_sum + smooth)

    loss = 1 - loss.mean()

    return loss


def LLoss(pred, target):
    loss = torch.tensor(0.0, requires_grad=True).to(pred)
    patch_size = pred.shape[0]
    h = pred.shape[2]
    w = pred.shape[3]
    x_index = torch.arange(0, w, 1).view(1, 1, w).repeat((1, h, 1)).to(pred) / w
    y_index = torch.arange(0, h, 1).view(1, h, 1).repeat((1, 1, w)).to(pred) / h
    smooth = 1e-8
    for i in range(patch_size):
        pred_centerx = (x_index * pred[i]).mean()
        pred_centery = (y_index * pred[i]).mean()

        target_centerx = (x_index * target[i]).mean()
        target_centery = (y_index * target[i]).mean()

        angle_loss = (4 / (math.pi ** 2)) * (torch.square(torch.arctan(pred_centery / (pred_centerx + smooth))
                                                           - torch.arctan(
            target_centery / (target_centerx + smooth))))

        pred_length = torch.sqrt(pred_centerx * pred_centerx + pred_centery * pred_centery + smooth)
        target_length = torch.sqrt(target_centerx * target_centerx + target_centery * target_centery + smooth)

        length_loss = (torch.min(pred_length, target_length)) / (torch.max(pred_length, target_length) + smooth)

        loss = loss + (1 - length_loss + angle_loss) / patch_size

    return loss


class SLSIoULoss(nn.Module):   # https://github.com/Lliu666/MSHNet
    """SLSIoULoss and our SDM Loss"""
    
    def __init__(self):
        super(SLSIoULoss, self).__init__()

    def forward(self, pred_log, target, warm_epoch, epoch, with_distance=True, dynamic=True, delta=0.5):
        pred = torch.sigmoid(pred_log)
        h = pred.shape[2]
        w = pred.shape[3]
        smooth = 0.0
        
        R_oc = 512 * 512 / ( w * h )
        intersection = pred * target

        intersection_sum = torch.sum(intersection, dim=(1, 2, 3))
        pred_sum = torch.sum(pred, dim=(1, 2, 3))
        target_sum = torch.sum(target, dim=(1, 2, 3))

        dis = torch.pow((pred_sum - target_sum) / 2, 2)

        alpha = (torch.min(pred_sum, target_sum) + dis + smooth) / (torch.max(pred_sum, target_sum) + dis + smooth)

        loss = (intersection_sum + smooth) / \
               (pred_sum + target_sum - intersection_sum + smooth)

        if epoch > warm_epoch:
            siou_loss = alpha * loss
            if dynamic:
                lloss = LLoss(pred, target)
                beta = (target_sum * delta * R_oc) / 81
                beta = torch.where(beta > delta, torch.tensor(delta), beta)
                beta = beta.mean()
                if with_distance:
                    loss = (1 + beta) * (1 - siou_loss.mean()) + (1 - beta) * lloss     #  SDM loss
                else:
                    loss = 1 - siou_loss.mean()
            else:
                if with_distance:
                    lloss = LLoss(pred, target)
                    loss = 1 - siou_loss.mean() + lloss
                else:
                    loss = 1 - siou_loss.mean()
        else:
            loss = 1 - loss.mean()
        return loss


class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


# ==================== 测试代码 ====================

if __name__ == "__main__":
    print("=" * 60)
    print("APConv 和 Loss 模块测试")
    print("=" * 60)
    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"使用设备: {device}\n")
    
    # 测试参数
    batch_size = 2
    in_channels = 64
    out_channels = 128
    height, width = 32, 32
    kernel_size = 3
    
    # ========== 测试 APConv 模块 ==========
    print("1. 测试 PConv (风车卷积)")
    print("-" * 60)
    x = torch.randn(batch_size, in_channels, height, width).to(device)
    print(f"输入形状: {x.shape}")
    
    pconv = PConv(in_channels, out_channels, kernel_size, s=1).to(device)
    out_pconv = pconv(x)
    print(f"PConv 输出形状: {out_pconv.shape}")
    print("[PASS] PConv 测试通过\n")
    
    # ========== 测试 APBottleneck ==========
    print("2. 测试 APBottleneck (非对称填充瓶颈)")
    print("-" * 60)
    ap_bottleneck = APBottleneck(in_channels, out_channels, shortcut=True).to(device)
    out_apb = ap_bottleneck(x)
    print(f"APBottleneck 输出形状: {out_apb.shape}")
    print("[PASS] APBottleneck 测试通过\n")
    
    # ========== 测试 APC2f ==========
    print("3. 测试 APC2f (CSP 瓶颈)")
    print("-" * 60)
    apc2f = APC2f(in_channels, out_channels, n=2, shortcut=True, P=True).to(device)
    out_apc2f = apc2f(x)
    print(f"APC2f 输出形状: {out_apc2f.shape}")
    print("[PASS] APC2f 测试通过\n")
    
    # ========== 测试 Loss 模块 ==========
    print("4. 测试 SoftIoULoss")
    print("-" * 60)
    pred_seg = torch.randn(batch_size, 1, height, width).to(device)
    target_seg = torch.randint(0, 2, (batch_size, 1, height, width)).float().to(device)
    soft_iou_loss = SoftIoULoss(pred_seg, target_seg)
    print(f"SoftIoU Loss: {soft_iou_loss.item():.4f}")
    print("[PASS] SoftIoULoss 测试通过\n")
    
    # ========== 测试 Dice Loss ==========
    print("5. 测试 Dice Loss")
    print("-" * 60)
    dice_loss = Dice(pred_seg, target_seg)
    print(f"Dice Loss: {dice_loss.item():.4f}")
    print("[PASS] Dice Loss 测试通过\n")
    
    # ========== 测试 SLSIoULoss ==========
    print("6. 测试 SLSIoULoss")
    print("-" * 60)
    sls_iou_loss = SLSIoULoss().to(device)
    loss_value = sls_iou_loss(pred_seg, target_seg, warm_epoch=0, epoch=5, with_distance=True, dynamic=True)
    print(f"SLSIoU Loss: {loss_value.item():.4f}")
    print("[PASS] SLSIoULoss 测试通过\n")
    
    # ========== 测试 BboxLoss ==========
    print("7. 测试 BboxLoss")
    print("-" * 60)
    num_anchors = 100
    reg_max = 16
    
    # 创建模拟数据
    pred_dist = torch.randn(batch_size, num_anchors, 4 * (reg_max + 1)).to(device)
    pred_bboxes = torch.randn(batch_size, num_anchors, 4).to(device)
    anchor_points = torch.randn(batch_size, num_anchors, 2).to(device)
    target_bboxes = torch.randn(batch_size, num_anchors, 4).to(device)
    target_scores = torch.randn(batch_size, num_anchors, 1).to(device)
    target_scores_sum = torch.tensor(batch_size * num_anchors).float().to(device)
    fg_mask = torch.ones(batch_size, num_anchors, dtype=torch.bool).to(device)
    
    bbox_loss_fn = BboxLoss(reg_max=reg_max, use_dfl=True).to(device)
    iou_loss, dfl_loss = bbox_loss_fn(
        pred_dist, pred_bboxes, anchor_points, target_bboxes,
        target_scores, target_scores_sum, fg_mask
    )
    print(f"IoU Loss: {iou_loss.item():.4f}")
    print(f"DFL Loss: {dfl_loss.item():.4f}")
    print("[PASS] BboxLoss 测试通过\n")
    
    # ========== 测试 bbox_iou ==========
    print("8. 测试 bbox_iou 函数")
    print("-" * 60)
    box1 = torch.tensor([[0.5, 0.5, 0.3, 0.3]]).to(device)  # xywh format
    box2 = torch.tensor([[0.5, 0.5, 0.3, 0.3], [0.6, 0.6, 0.2, 0.2]]).to(device)
    iou = bbox_iou(box1, box2, xywh=True, SDIoU=True)
    print(f"IoU values: {iou.squeeze().cpu().numpy()}")
    print("[PASS] bbox_iou 测试通过\n")
    
    # ========== 测试 AverageMeter ==========
    print("9. 测试 AverageMeter")
    print("-" * 60)
    meter = AverageMeter()
    for i in range(10):
        meter.update(torch.tensor(i * 0.1).item())
    print(f"平均值: {meter.avg:.4f}")
    print(f"当前值: {meter.val:.4f}")
    print("[PASS] AverageMeter 测试通过\n")
    
    print("=" * 60)
    print("所有测试通过！模块可以正常使用。")
    print("=" * 60)