[分享][每日更新][2024.03.01][CV_arxiv_papers]

293 阅读13分钟

[UPDATED!] 2024-03-01 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01Diff-Plugin: Revitalizing Details for Diffusion-based Low-level TasksDiff-Plugin:重振基于扩散的低级任务的细节Yuhao Liu, Fang Liu, Zhanghan Ke, Nanxuan Zhao, Rynson W. H. Lauarxiv.org/pdf/2403.00…null
2024-03-01Graph Theory and GNNs to Unravel the Topographical Organization of Brain Lesions in Variants of Alzheimer's Disease Progression图论和 GNN 揭示阿尔茨海默病进展变体中脑损伤的拓扑结构Leopold Hebert-Stevens, Gabriel Jimenez, Benoit Delatour, Lev Stimmer, Daniel Racoceanuarxiv.org/pdf/2403.00…null
2024-03-01Rethinking Few-shot 3D Point Cloud Semantic Segmentation重新思考少样本 3D 点云语义分割Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongiearxiv.org/pdf/2403.00…null
2024-03-01Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset通过自动导出的数据集改善文本到图像生成中的显式空间关系Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Kellerarxiv.org/pdf/2403.00…null
2024-03-01Rethinking cluster-conditioned diffusion models重新思考集群条件扩散模型Nikolas Adaloglou, Tim Kaiser, Felix Michels, Markus Kollmannarxiv.org/pdf/2403.00…null
2024-03-01Deformable One-shot Face Stylization via DINO Semantic Guidance通过 DINO 语义指导进行可变形的一次性面部风格化Yang Zhou, Zichong Chen, Hui Huangarxiv.org/pdf/2403.00…null
2024-03-01An Ordinal Diffusion Model for Generating Medical Images with Different Severity Levels生成不同严重程度的医学图像的序数扩散模型Shumpei Takezaki, Seiichi Uchidaarxiv.org/pdf/2403.00…null
2024-03-01LoMOE: Localized Multi-Object Editing via Multi-DiffusionLoMOE:通过多重扩散进行本地化多对象编辑Goirik Chakrabarty, Aditya Chandrasekar, Ramya Hebbalaguppe, Prathosh AParxiv.org/pdf/2403.00…null
2024-03-01Abductive Ego-View Accident Video Understanding for Safe Driving Perception溯因式自我观看事故视频理解以实现安全驾驶感知Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chuaarxiv.org/pdf/2403.00…null
2024-03-01HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape GenerationHyperSDFusion:桥接语言和几何中的层次结构以增强 3D Text2Shape 生成Zhiying Leng, Tolga Birdal, Xiaohui Liang, Federico Tombariarxiv.org/pdf/2403.00…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01Few-Shot Relation Extraction with Hybrid Visual Evidence具有混合视觉证据的少样本关系提取Jiaying Gong, Hoda Eldardiryarxiv.org/pdf/2403.00…null
2024-03-01HALC: Object Hallucination Reduction via Adaptive Focal-Contrast DecodingHALC:通过自适应焦点对比度解码减少物体幻觉Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhouarxiv.org/pdf/2403.00…null
2024-03-01Exploring the dynamic interplay of cognitive load and emotional arousal by using multimodal measurements: Correlation of pupil diameter and emotional arousal in emotionally engaging tasks通过使用多模态测量探索认知负荷和情绪唤醒的动态相互作用:情绪参与任务中瞳孔直径和情绪唤醒的相关性C. Kosel, S. Michel, T. Seidel, M. Foersterarxiv.org/pdf/2403.00…null
2024-03-01MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-ScenesMS-Net:多场景运动预测的多路径稀疏模型Xiaqiang Tang, Weigao Sun, Siyuan Hu, Yiyang Sun, Yafeng Guoarxiv.org/pdf/2403.00…null
2024-03-01Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models多模态 ArXiv:用于提高大型视觉语言模型科学理解的数据集Lei Li, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liuarxiv.org/pdf/2403.00…null
2024-03-01Multi-modal Attribute Prompting for Vision-Language Models视觉语言模型的多模态属性提示Xin Liu, Jiamin Wu, Tianzhu Zhangarxiv.org/pdf/2403.00…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01DISORF: A Distributed Online NeRF Training and Rendering Framework for Mobile RobotsDISORF:用于移动机器人的分布式在线 NeRF 训练和渲染框架Chunlin Li, Ruofan Liang, Hanrui Fan, Zhengen Zhang, Sankeerth Durvasula, Nandita Vijaykumararxiv.org/pdf/2403.00…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01Learning Causal Features for Incremental Object Detection学习增量对象检测的因果特征Zhenwei He, Lei Zhangarxiv.org/pdf/2403.00…null
2024-03-01Data-efficient Event Camera Pre-training via Disentangled Masked Modeling通过解缠屏蔽建模进行数据高效的事件相机预训练Zhenpeng Huang, Chao Li, Hao Chen, Yongjian Deng, Yifeng Geng, Limin Wangarxiv.org/pdf/2403.00…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor相机和全局位姿传感器的联合时空校准Junlin Song, Antoine Richard, Miguel Olivares-Mendezarxiv.org/pdf/2403.00…null
2024-03-01Can Transformers Capture Spatial Relations between Objects?变形金刚可以捕捉物体之间的空间关系吗?Chuan Wen, Dinesh Jayaraman, Yang Gaoarxiv.org/pdf/2403.00…null
2024-03-01COLON: The largest COlonoscopy LONg sequence public databaseCOLON:最大的结肠镜检查长序列公共数据库Lina Ruiz, Franklin Sierra-Jerez, Jair Ruiz, Fabio Martinezarxiv.org/pdf/2403.00…null
2024-03-01Region-Adaptive Transform with Segmentation Prior for Image Compression用于图像压缩的具有分割先验的区域自适应变换Yuxi Liu, Wenhan Yang, Huihui Bai, Yunchao Wei, Yao Zhaoarxiv.org/pdf/2403.00…link
2024-03-01IDTrust: Deep Identity Document Quality Detection with Bandpass FilteringIDTrust:使用带通滤波进行深度身份文档质量检测Musab Al-Ghadi, Joris Voerman, Souhail Bakkali, Mickaël Coustaty, Nicolas Sidere, Xavier St-Georgesarxiv.org/pdf/2403.00…null
2024-03-01Lincoln's Annotated Spatio-Temporal Strawberry Dataset (LAST-Straw)林肯带注释的时空草莓数据集 (LAST-Straw)Katherine Margaret Frances James, Karoline Heiwolt, Daniel James Sargent, Grzegorz Cielniakarxiv.org/pdf/2403.00…null
2024-03-01SURE: SUrvey REcipes for building reliable and robust deep networks当然:构建可靠且强大的深度网络的调查秘诀Yuting Li, Yingyi Chen, Xuanlong Yu, Dexiong Chen, Xi Shenarxiv.org/pdf/2403.00…link
2024-03-01Large Language Models for Simultaneous Named Entity Extraction and Spelling Correction用于同时命名实体提取和拼写纠正的大型语言模型Edward Whittaker, Ikuo Kitagishiarxiv.org/pdf/2403.00…null
2024-03-01GLFNET: Global-Local (frequency) Filter Networks for efficient medical image segmentationGLFNET:用于高效医学图像分割的全局-局部(频率)滤波器网络Athanasios Tragakis, Qianying Liu, Chaitanya Kaul, Swalpa Kumar Roy, Hang Dai, Fani Deligianni, Roderick Murray-Smith, Daniele Faccioarxiv.org/pdf/2403.00…null
2024-03-01Invariant Test-Time Adaptation for Vision-Language Model Generalization视觉语言模型泛化的不变测试时间适应Huan Ma, Yan Zhu, Changqing Zhang, Peilin Zhao, Baoyuan Wu, Long-Kai Huang, Qinghua Hu, Bingzhe Wuarxiv.org/pdf/2403.00…null
2024-03-01DAMS-DETR: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature FusionDAMS-DETR:具有竞争性查询选择和自适应特征融合的动态自适应多光谱检测变压器Guo Junjie, Gao Chenqiang, Liu Fangcen, Meng Deyuarxiv.org/pdf/2403.00…null
2024-03-01Small, Versatile and Mighty: A Range-View Perception Framework小巧、多功能、强大:范围-视图感知框架Qiang Meng, Xiao Wang, JiaBao Wang, Liujiang Yan, Ke Wangarxiv.org/pdf/2403.00…null
2024-03-01Embedded Multi-label Feature Selection via Orthogonal Regression通过正交回归进行嵌入式多标签特征选择Xueyuan Xu, Fulin Wei, Tianyuan Jia, Li Zhuo, Feiping Nie, Xia Wuarxiv.org/pdf/2403.00…null
2024-03-01ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and SpottingODM:一种用于场景文本检测和定位的文本-图像进一步对齐预训练方法Chen Duan, Pei Fu, Shan Guo, Qianyi Jiang, Xiaoming Weiarxiv.org/pdf/2403.00…null
2024-03-01CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head GenerationCustomListener:文本引导的响应式交互,用于用户友好的聆听头生成Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yanarxiv.org/pdf/2403.00…null
2024-03-01Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval双姿势不变嵌入:用于识别和检索的学习类别和特定于对象的判别表示Rohan Sarkar, Avinash Kakarxiv.org/pdf/2403.00…null
2024-03-01Robust deep labeling of radiological emphysema subtypes using squeeze and excitation convolutional neural networks: The MESA Lung and SPIROMICS Studies使用挤压和激励卷积神经网络对放射性肺气肿亚型进行稳健深度标记:MESA Lung 和 SPIROMICS 研究Artur Wysoczanski, Nabil Ettehadi, Soroush Arabshahi, Yifei Sun, Karen Hinkley Stukovsky, Karol E. Watson, MeiLan K. Han, Erin D Michos, Alejandro P. Comellas, Eric A. Hoffman, et.al.arxiv.org/pdf/2403.00…null
2024-03-01Cloud-based Federated Learning Framework for MRI Segmentation基于云的 MRI 分割联合学习框架Rukesh Prajapati, Amr S. El-Wakeelarxiv.org/pdf/2403.00…null
2024-03-01Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach重新思考长尾识别中的分类器重新训练:一种简单的 Logits 重定向方法Han Lu, Siyu Sun, Yichen Xie, Liqing Zhang, Xiaokang Yang, Junchi Yanarxiv.org/pdf/2403.00…null
2024-03-01YOLO-MED : Multi-Task Interaction Network for Biomedical ImagesYOLO-MED:生物医学图像的多任务交互网络Suizhi Huang, Shalayiding Sirejiding, Yuxiang Lu, Yue Ding, Leheng Liu, Hui Zhou, Hongtao Luarxiv.org/pdf/2403.00…null
2024-03-01Trustworthy Self-Attention: Enabling the Network to Focus Only on the Most Relevant References值得信赖的自我关注:使网络仅关注最相关的参考文献Yu Jing, Tan Yujuan, Ren Ao, Liu Duoarxiv.org/pdf/2403.00…null
2024-03-01MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set AnalysisMaskLRF:通过局部参考系的屏蔽自动编码进行自监督预训练,用于旋转不变的 3D 点集分析Takahiko Furuyaarxiv.org/pdf/2403.00…null

图像理解

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01Rethinking Inductive Biases for Surface Normal Estimation重新思考表面法线估计的归纳偏差Gwangbin Bae, Andrew J. Davisonarxiv.org/pdf/2403.00…null
2024-03-01Rethinking The Uniformity Metric in Self-Supervised Learning重新思考自我监督学习中的均匀性指标Xianghong Fang, Jian Li, Qiang Sun, Benyou Wangarxiv.org/pdf/2403.00…null
2024-03-01Improving Acne Image Grading with Label Distribution Smoothing通过标签分布平滑改进痤疮图像分级Kirill Prokhorov, Alexandr A. Kalininarxiv.org/pdf/2403.00…null

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01TempCompass: Do Video LLMs Really Understand Videos?TempCompass:视频法学硕士真的了解视频吗?Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Houarxiv.org/pdf/2403.00…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01Tri-Modal Motion Retrieval by Learning a Joint Embedding Space通过学习联合嵌入空间进行三模态运动检索Kangning Yin, Shihao Zou, Yuxuan Ge, Zheng Tianarxiv.org/pdf/2403.00…null
2024-03-01VisionLLaMA: A Unified LLaMA Interface for Vision TasksVisionLLaMA:用于视觉任务的统一 LLaMA 接口Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shenarxiv.org/pdf/2403.00…null
2024-03-01Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching选择性立体声:用于立体声匹配的自适应频率信息选择Xianqi Wang, Gangwei Xu, Hao Jia, Xin Yangarxiv.org/pdf/2403.00…null
2024-03-01RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image CustomizationRealCustom:缩小真实文本字的范围,实现实时开放域文本到图像的定制Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhangarxiv.org/pdf/2403.00…null
2024-03-01Task Indicating Transformer for Task-conditional Dense Predictions用于任务条件密集预测的任务指示变压器Yuxiang Lu, Shalayiding Sirejiding, Bayram Bayramli, Suizhi Huang, Yue Ding, Hongtao Luarxiv.org/pdf/2403.00…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01G3DR: Generative 3D Reconstruction in ImageNetG3DR:ImageNet 中的生成 3D 重建Pradyumna Reddy, Ismail Elezi, Jiankang Dengarxiv.org/pdf/2403.00…null
2024-03-01Point Could Mamba: Point Cloud Learning via State Space ModelPoint Could Mamba:通过状态空间模型进行点云学习Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, Shuicheng Yanarxiv.org/pdf/2403.00…null
2024-03-01Learning and Leveraging World Models in Visual Representation Learning在视觉表示学习中学习和利用世界模型Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCunarxiv.org/pdf/2403.00…null
2024-03-01Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training用于视觉语言预训练的语义增强跨模态掩模图像建模Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, et.al.arxiv.org/pdf/2403.00…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01VisRec: A Semi-Supervised Approach to Radio Interferometric Data ReconstructionVisRec:无线电干涉数据重建的半监督方法Ruoqi Wang, Haitao Wang, Qiong Luo, Feng Wang, Hejun Wuarxiv.org/pdf/2403.00…null
2024-03-01Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning展平远程损失景观以实现跨域小样本学习Yixiong Zou, Yicong Liu, Yiman Hu, Yuhua Li, Ruixuan Liarxiv.org/pdf/2403.00…null
2024-03-01Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification用于无监督人员重新识别的空间级联聚类和加权记忆Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sangarxiv.org/pdf/2403.00…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-01SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social NavigationSELFI:通过强化学习进行社交导航的自主自我提升Noriaki Hirose, Dhruv Shah, Kyle Stachowicz, Ajay Sridhar, Sergey Levinearxiv.org/pdf/2403.00…null
2024-03-01Fine-tuning with Very Large Dropout使用非常大的压差进行微调Jianyu Zhang, Léon Bottouarxiv.org/pdf/2403.00…null
2024-03-01Fast and Efficient Local Search for Genetic Programming Based Loss Function Learning基于遗传编程的损失函数学习的快速高效的局部搜索Christian Raymond, Qi Chen, Bing Xue, Mengjie Zhangarxiv.org/pdf/2403.00…null
2024-03-01Hydra: Computer Vision for Data Quality MonitoringHydra:用于数据质量监控的计算机视觉Thomas Britton, Torri Jeske, David Lawrence, Kishansingh Rajputarxiv.org/pdf/2403.00…null
2024-03-01Advancing dermatological diagnosis: Development of a hyperspectral dermatoscope for enhanced skin imaging推进皮肤病诊断:开发用于增强皮肤成像的高光谱皮肤镜Martin J. Hetz, Carina Nogueira Garcia, Sarah Haggenmüller, Titus J. Brinkerarxiv.org/pdf/2403.00…null
2024-03-01Flattening Singular Values of Factorized Convolution for Medical Images医学图像因式分解卷积的奇异值展平Zexin Feng, Na Zeng, Jiansheng Fang, Xingyue Wang, Xiaoxi Lu, Heng Meng, Jiang Liuarxiv.org/pdf/2403.00…null
2024-03-01Multi-Task Learning Using Uncertainty to Weigh Losses for Heterogeneous Face Attribute Estimation多任务学习利用不确定性来权衡异质人脸属性估计的损失Huaqing Yuan, Yi He, Peng Du, Lu Songarxiv.org/pdf/2403.00…null
2024-03-01Relaxometry Guided Quantitative Cardiac Magnetic Resonance Image Reconstruction松弛测量引导定量心脏磁共振图像重建Yidong Zhao, Yi Zhang, Qian Taoarxiv.org/pdf/2403.00…null
2024-03-01When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability当 ControlNet 遇到不明确的掩模时:ControlNet 轮廓跟踪能力案例研究Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Taoarxiv.org/pdf/2403.00…null
2024-03-01Deep Learning Computed Tomography based on the Defrise and Clack Algorithm基于 Defrise 和 Clack 算法的深度学习计算机断层扫描Chengze Ye, Linda-Sophie Schneider, Yipeng Sun, Andreas Maierarxiv.org/pdf/2403.00…null
2024-03-01Spatio-temporal reconstruction of substance dynamics using compressed sensing in multi-spectral magnetic resonance spectroscopic imaging多光谱磁共振波谱成像中利用压缩感知对物质动力学进行时空重建Utako Yamamoto, Hirohiko Imai, Kei Sano, Masayuki Ohzeki, Tetsuya Matsuda, Toshiyuki Tanakaarxiv.org/pdf/2403.00…null
2024-03-01List-Mode PET Image Reconstruction Using Dykstra-Like Splitting使用 Dykstra 类分割的列表模式 PET 图像重建Kibo Ote, Fumio Hashimoto, Yuya Onishi, Yasuomi Ouchiarxiv.org/pdf/2403.00…null
2024-03-01Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning重新审视下游任务中的解缠结:对其抽象视觉推理必要性的研究Ruiqian Nai, Zixin Wen, Ji Li, Yuanzhi Li, Yang Gaoarxiv.org/pdf/2403.00…null
2024-03-01Event-Driven Learning for Spiking Neural Networks尖峰神经网络的事件驱动学习Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Liarxiv.org/pdf/2403.00…null
2024-03-01Parameter-Efficient Tuning of Large Convolutional Models大型卷积模型的参数高效调整Wei Chen, Zichen Miao, Qiang Qiuarxiv.org/pdf/2403.00…null
2024-03-01Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART在自定义数据集和 mBART 上使用微调的 XLSR Wav2Vec2 进行视频转录和翻译Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitraarxiv.org/pdf/2403.00…null
2024-03-01ChartReformer: Natural Language-Driven Chart Image EditingChartReformer:自然语言驱动的图表图像编辑Pengyu Yan, Mahesh Bhosale, Jay Lal, Bikhyat Adhikari, David Doermannarxiv.org/pdf/2403.00…null