[分享][每日更新][2024.02.18][CV_arxiv_papers]

308 阅读6分钟

[UPDATED!] 2024-02-18 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18SDiT: Spiking Diffusion Model with TransformerSDiT:带变压器的尖峰扩散模型Shu Yang, Hanzhi Ma, Chengting Yu, Aili Wang, Er-Ping Liarxiv.org/pdf/2402.11…null
2024-02-18GenAD: Generative End-to-End Autonomous DrivingGenAD:生成式端到端自动驾驶Wenzhao Zheng, Ruiqi Song, Xianda Guo, Long Chenarxiv.org/pdf/2402.11…null
2024-02-18IRFundusSet: An Integrated Retinal Rundus Dataset with a Harmonized Healthy LabelIRFundusSet:具有统一健康标签的综合视网膜 Rundus 数据集P. Bilha Githinji, Keming Zhao, Jiantao Wang, Peiwu Qinarxiv.org/pdf/2402.11…null
2024-02-18Visual Concept-driven Image Generation with Text-to-Image Diffusion Model使用文本到图像扩散模型的视觉概念驱动的图像生成Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigalarxiv.org/pdf/2402.11…null
2024-02-18Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition用于广义零样本识别的数据分布蒸馏生成模型Yijie Wang, Mingjian Hong, Luwen Huangfu, Sheng Huangarxiv.org/pdf/2402.11…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionMultiCorrupt:用于 3D 物体检测的多模态鲁棒性数据集和 LiDAR-相机融合的基准Till Beemelmanns, Quan Zhang, Lutz Ecksteinarxiv.org/pdf/2402.11…null
2024-02-18Efficient Multimodal Learning from Data-centric Perspective以数据为中心的高效多模态学习Muyang He, Yexin Liu, Boya Wu, Jianhao Yuan, Yueze Wang, Tiejun Huang, Bo Zhaoarxiv.org/pdf/2402.11…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18MAL: Motion-Aware Loss with Temporal and Distillation Hints for Self-Supervised Depth EstimationMAL:具有时间和蒸馏提示的运动感知损失,用于自监督深度估计Yup-Jiang Dong, Fang-Lue Zhang, Song-Hai Zhangarxiv.org/pdf/2402.11…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object DetectionLiRaFusion:用于 3D 物体检测的深度自适应 LiDAR-雷达融合Jingyu Song, Lingjun Zhao, Katherine A. Skinnerarxiv.org/pdf/2402.11…null
2024-02-18Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry挑战黑匣子:CNN农林应用归因图综合评价Lars Nieradzik, Henrike Stephani, Jördis Sieburg-Rockel, Stephanie Helmling, Andrea Olbrich, Janis Keuperarxiv.org/pdf/2402.11…null
2024-02-18Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models逻辑闭环:揭示大型视觉语言模型中的物体幻觉Junfei Wu, Qiang Liu, Ding Wang, Jinghao Zhang, Shu Wu, Liang Wang, Tieniu Tanarxiv.org/pdf/2402.11…null
2024-02-18PolypNextLSTM: A lightweight and fast polyp video segmentation network using ConvNext and ConvLSTMPolypNextLSTM:使用 ConvNext 和 ConvLSTM 的轻量级快速息肉视频分割网络Debayan Bhattacharya, Konrad Reuter, Finn Behrendnt, Lennart Maack, Sarah Grube, Alexander Schlaeferarxiv.org/pdf/2402.11…null
2024-02-18A novel Fourier neural operator framework for classification of multi-sized images: Application to 3D digital porous media用于多尺寸图像分类的新型傅立叶神经算子框架:在 3D 数字多孔介质中的应用Ali Kashefi, Tapan Mukerjiarxiv.org/pdf/2402.11…null
2024-02-18CPN: Complementary Proposal Network for Unconstrained Text DetectionCPN:用于无约束文本检测的补充提案网络Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiongarxiv.org/pdf/2402.11…null
2024-02-18Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading大词汇量阿拉伯语唇读的视觉和几何特征的交叉注意融合Samar Daou, Ahmed Rekik, Achraf Ben-Hamadou, Abdelaziz Kallelarxiv.org/pdf/2402.11…null
2024-02-18Underestimation of lung regions on chest X-ray segmentation masks assessed by comparison with total lung volume evaluated on computed tomography通过与计算机断层扫描评估的总肺体积进行比较来评估胸部 X 射线分割掩模上的肺部区域低估Przemysław Bombiński, Patryk Szatkowski, Bartłomiej Sobieski, Tymoteusz Kwieciński, Szymon Płotka, Mariusz Adamek, Marcin Banasiuk, Mariusz I. Furmanek, Przemysław Biecekarxiv.org/pdf/2402.11…null
2024-02-18Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training通过多视角自监督学习和两阶段预训练提高甲状腺超声诊断Jian Wang, Xin Yang, Xiaohong Jia, Wufeng Xue, Rusi Chen, Yanlin Chen, Xiliang Zhu, Lian Liu, Yan Cao, Jianqiao Zhou, et.al.arxiv.org/pdf/2402.11…null
2024-02-18EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy DiagnosisEndoOOD:胶囊内窥镜诊断中的不确定性分布外检测Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Renarxiv.org/pdf/2402.11…null
2024-02-18Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection有毒的伪造人脸:针对人脸伪造检测的后门攻击Jiawei Liang, Siyuan Liang, Aishan Liu, Xiaojun Jia, Junhao Kuang, Xiaochun Caoarxiv.org/pdf/2402.11…null
2024-02-18Key Patch Proposer: Key Patches Contain Rich Information关键补丁提议者:关键补丁包含丰富信息Jing Xu, Beiwen Tian, Hao Zhaoarxiv.org/pdf/2402.11…null
2024-02-18Momentor: Advancing Video Large Language Model with Fine-Grained Temporal ReasoningMomentor:利用细粒度时序推理推进视频大语言模型Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tangarxiv.org/pdf/2402.11…null
2024-02-18A Multispectral Automated Transfer Technique (MATT) for machine-driven image labeling utilizing the Segment Anything Model (SAM)利用分段任意模型 (SAM) 进行机器驱动图像标记的多光谱自动传输技术 (MATT)James E. Gallagher, Aryav Gogia, Edward J. Oughtonarxiv.org/pdf/2402.11…null

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18Aligning Modalities in Vision Large Language Models via Preference Fine-tuning通过偏好微调来调整视觉大语言模型中的模态Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yaoarxiv.org/pdf/2402.11…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-183D Point Cloud Compression with Recurrent Neural Network and Image Compression Methods使用递归神经网络和图像压缩方法进行 3D 点云压缩Till Beemelmanns, Yuchen Tao, Bastian Lampe, Lennart Reiher, Raphael van Kempen, Timo Woopen, Lutz Ecksteinarxiv.org/pdf/2402.11…null
2024-02-18Neuromorphic Face Analysis: a Survey神经形态面部分析:一项调查Federico Becattini, Lorenzo Berlincioni, Luca Cultrera, Alberto Del Bimboarxiv.org/pdf/2402.11…null
2024-02-18A Robust Error-Resistant View Selection Method for 3D Reconstruction一种鲁棒、抗错的 3D 重建视图选择方法Shaojie Zhang, Yinghui Wang, Bin Nan, Jinlong Yang, Tao Yan, Liangyi Huang, Mingfeng Wangarxiv.org/pdf/2402.11…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training通过重新审视数据增强和一致性训练来促进半监督二维人体姿势估计Huayi Zhou, Mukun Luo, Fei Jiang, Yue Ding, Hongtao Luarxiv.org/pdf/2402.11…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-18The Effectiveness of Random Forgetting for Robust Generalization随机遗忘对鲁棒泛化的有效性Vijaya Raghavan T Ramkumar, Bahram Zonooz, Elahe Araniarxiv.org/pdf/2402.11…null
2024-02-18Learning Conditional Invariances through Non-Commutativity通过非交换性学习条件不变性Abhra Chaudhuri, Serban Georgescu, Anjan Duttaarxiv.org/pdf/2402.11…null
2024-02-18Interactive Garment Recommendation with User in the Loop与用户互动的服装推荐Federico Becattini, Xiaolin Chen, Andrea Puccia, Haokun Wen, Xuemeng Song, Liqiang Nie, Alberto Del Bimboarxiv.org/pdf/2402.11…null
2024-02-18Visual In-Context Learning for Large Vision-Language Models大型视觉语言模型的视觉上下文学习Yucheng Zhou, Xiang Li, Qianning Wang, Jianbing Shenarxiv.org/pdf/2402.11…null
2024-02-18Evaluating Adversarial Robustness of Low dose CT Recovery评估低剂量 CT 恢复的对抗鲁棒性Kanchana Vaishnavi Gandikota, Paramanand Chandramouli, Hannah Droege, Michael Moellerarxiv.org/pdf/2402.11…null
2024-02-18To use or not to use proprietary street view images in (health and place) research? That is the question在(健康和场所)研究中使用或不使用专有街景图像?就是那个问题Marco Helbich, Matthew Danish, SM Labib, Britta Rickerarxiv.org/pdf/2402.11…null