基于yolov8的特征蒸馏和逻辑蒸馏

1,084 阅读1分钟

09.14        在YOLOv8下的知识蒸馏,目前实验进展,已测试基于特征图的CWD和MGD,对自建数据集均有提点。其中,学生模型YOLOv8n,教师模型YOLOv8s,CWD有效提点1.01%,MGD提点0.34%。同时,支持对自己的改进模型进行知识蒸馏。\n\n09.16        框架大改,加入Logits蒸馏。支持Logits蒸馏和特征蒸馏同时或者分别进行。\n\n目前支持如下方法:\n\nLogits蒸馏:最新的BCKD(Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection)arxiv.org/pdf/2308.14… Knowledge Distillation for Dense Prediction)arxiv.org/pdf/2011.13… Generative Distillation)arxiv.org/abs/2205.01… and Global Knowledge Distillation for Detectors)arxiv.org/abs/2111.11… Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning)openaccess.thecvf.com/content_cvp…        BCKD实验结果,自制数据集上提点1.63%,优于CWD,并且两者可以同时训练。\n\n09.18        加入调试成功的各类蒸馏方法。\n\n 目前支持如下方法:\n\nLogits蒸馏:最新的BCKD(Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection)arxiv.org/pdf/2308.14… Knowledge Distillation for Dense Object Detection)arxiv.org/abs/2306.11… Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels)arxiv.org/abs/2303.13… Knowledge Distillation) arxiv.org/pdf/2203.08… LD(Localization Distillation for Dense Object Detection) arxiv.org/abs/2102.12… the Soft Label of Knowledge Extraction: A Bias-Balance Perspective)          arxiv.org/pdf/2102.00… the Knowledge in a Neural Network arxiv.org/pdf/1503.02… Knowledge Distillation for Dense Prediction)arxiv.org/pdf/2011.13… Generative Distillation)arxiv.org/abs/2205.01… and Global Knowledge Distillation for Detectors)arxiv.org/abs/2111.11… Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning)openaccess.thecvf.com/content_cvp… Distillation Framework for Object Detectors via Pearson Correlation Coefficient) arxiv.org/abs/2207.02…        单独使用LD在回归分支的实验结果,目前表现最好,提点1.69%,比加了分类分支的BCKD要好。原因分析:可能是分类分支的KD影响了回归分支。\n\n需要的联系,代码复现不易。