[分享][每日更新][2024.03.07][CV_arxiv_papers]

137 阅读12分钟

[UPDATED!] 2024-03-07 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLMSnapNTell:通过检索增强多模态法学硕士增强以实体为中心的视觉问答Jielin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moonarxiv.org/pdf/2403.04…null
2024-03-07ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional ChangesObjectCompose:评估基于视觉的模型对对象到背景成分变化的弹性Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khanarxiv.org/pdf/2403.04…null
2024-03-07Delving into the Trajectory Long-tail Distribution for Muti-object Tracking深入研究多目标跟踪的轨迹长尾分布Sijia Chen, En Yu, Jinyang Li, Wenbing Taoarxiv.org/pdf/2403.04…link
2024-03-07PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image GenerationPixArt-Σ:用于 4K 文本到图像生成的扩散变压器的弱到强训练Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Liarxiv.org/pdf/2403.04…null
2024-03-07Pix2Gif: Motion-Guided Diffusion for GIF GenerationPix2Gif:用于 GIF 生成的运动引导扩散Hitesh Kandala, Jianfeng Gao, Jianwei Yangarxiv.org/pdf/2403.04…null
2024-03-07A Domain Translation Framework with an Adversarial Denoising Diffusion Model to Generate Synthetic Datasets of Echocardiography Images具有对抗性去噪扩散模型的域翻译框架,用于生成超声心动图图像的合成数据集Cristiana Tiago, Sten Roar Snare, Jurica Sprem, Kristin McLeodarxiv.org/pdf/2403.04…null
2024-03-07Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser具有分层空间和时间降噪器的基于解缠扩散的 3D 人体姿势估计Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huangarxiv.org/pdf/2403.04…null
2024-03-07StableDrag: Stable Dragging for Point-based Image EditingStableDrag:基于点的图像编辑的稳定拖动Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wangarxiv.org/pdf/2403.04…null
2024-03-07Video-Driven Animation of Neural Head Avatars神经头头像的视频驱动动画Wolfgang Paier, Paul Hinzer, Anna Hilsmann, Peter Eisertarxiv.org/pdf/2403.04…null
2024-03-07Discriminative Probing and Tuning for Text-to-Image Generation文本到图像生成的判别性探测和调整Leigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chuaarxiv.org/pdf/2403.04…null
2024-03-07MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual InvariantMedM2G:通过交叉引导扩散与视觉不变性统一医学多模态生成Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wuarxiv.org/pdf/2403.04…null
2024-03-07Controllable Generation with Text-to-Image Diffusion Models: A Survey使用文本到图像扩散模型的可控生成:调查Pu Cao, Feng Zhou, Qing Song, Lu Yangarxiv.org/pdf/2403.04…link
2024-03-073DTextureTransformer: Geometry Aware Texture Generation for Arbitrary Mesh Topology3DTextureTransformer:任意网格拓扑的几何感知纹理生成Dharma KC, Clayton T. Morrisonarxiv.org/pdf/2403.04…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCAT:增强多模态大语言模型以回答动态视听场景中的问题Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Caoarxiv.org/pdf/2403.04…null
2024-03-07MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked AutoencoderMedFLIP:使用屏蔽自动编码器进行医学视觉和语言自监督快速预训练Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chenarxiv.org/pdf/2403.04…null
2024-03-07TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentTextMonkey:用于理解文档的无 OCR 大型多模态模型Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Baiarxiv.org/pdf/2403.04…null
2024-03-07Effectiveness Assessment of Recent Large Vision-Language Models最新大型视觉语言模型的有效性评估Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-Ping Fan, Fahad Shahbaz Khanarxiv.org/pdf/2403.04…null
2024-03-07A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition视听语音识别中丢失视频帧鲁棒性的丢失引起的模态偏差研究Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Leearxiv.org/pdf/2403.04…null
2024-03-07DNAct: Diffusion Guided Multi-Task 3D Policy LearningDNAct:扩散引导的多任务 3D 策略学习Ge Yan, Yueh-Hua Wu, Xiaolong Wangarxiv.org/pdf/2403.04…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07Finding Waldo: Towards Efficient Exploration of NeRF Scene Space寻找 Waldo:迈向 NeRF 场景空间的高效探索Evangelos Skartados, Mehmet Kerim Yucel, Bruno Manganelli, Anastasios Drosou, Albert Saà-Garrigaarxiv.org/pdf/2403.04…null
2024-03-07Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis用于高效 X 射线新颖视图合成的辐射高斯溅射Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuillearxiv.org/pdf/2403.04…null
2024-03-07Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs使用对象可组合 NeRF 缩小视觉模拟与真实的差距Nikhil Mishra, Maximilian Sieb, Pieter Abbeel, Xi Chenarxiv.org/pdf/2403.04…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07SWAP-NAS: Sample-Wise Activation Patterns For Ultra-Fast NASSWAP-NAS:超高速 NAS 的采样激活模式Yameng Peng, Andy Song, Haytham M. Fayek, Vic Ciesielski, Xiaojun Changarxiv.org/pdf/2403.04…null
2024-03-07MAP: MAsk-Pruning for Source-Free Model Intellectual Property ProtectionMAP:无源模型知识产权保护的掩模修剪Boyang Peng, Sanqing Qu, Yong Wu, Tianpei Zou, Lianghua He, Alois Knoll, Guang Chen, changjun jiangarxiv.org/pdf/2403.04…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation这就是我的观点:用于大规模室外定位的紧凑型以对象为中心的 LiDAR 姿态估计Georgi Pramatarov, Matthew Gadd, Paul Newman, Daniele De Martiniarxiv.org/pdf/2403.04…null
2024-03-07AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit DetectorsAUFormer:视觉变压器是参数高效的面部动作单元检测器Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu Yangarxiv.org/pdf/2403.04…null
2024-03-07Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes走出房间:针对复杂场景推广基于事件的动态运动分割Stamatios Georgoulis, Weining Ren, Alfredo Bochicchio, Daniel Eckert, Yuanyou Li, Abel Gawelarxiv.org/pdf/2403.04…null
2024-03-07Reducing self-supervised learning complexity improves weakly-supervised classification performance in computational pathology降低自监督学习复杂性可提高计算病理学中的弱监督分类性能Tim Lenz, Omar S. M. El Nahhas, Marta Ligero, Jakob Nikolas Katherarxiv.org/pdf/2403.04…null
2024-03-07Explainable Face Verification via Feature-Guided Gradient Backpropagation通过特征引导梯度反向传播进行可解释的人脸验证Yuhang Lu, Zewei Xu, Touradj Ebrahimiarxiv.org/pdf/2403.04…null
2024-03-07T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision TransformersT-TAME:用于解释卷积网络和视觉变压器的可训练注意力机制Mariano V. Ntrougkas, Nikolaos Gkalelis, Vasileios Mezarisarxiv.org/pdf/2403.04…null
2024-03-07Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning用于跨域少样本学习的判别性样本引导和参数高效的特征空间适应Rashindrie Perera, Saman Halgamugearxiv.org/pdf/2403.04…null
2024-03-07Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging源很重要:源数据集对医学成像模型稳健性的影响Dovile Juodelyte, Yucheng Lu, Amelia Jiménez-Sánchez, Sabrina Bottazzi, Enzo Ferrante, Veronika Cheplyginaarxiv.org/pdf/2403.04…null
2024-03-07Improved Focus on Hard Samples for Lung Nodule Detection提高对肺结节检测中硬样本的关注Yujiang Chen, Mei Xiearxiv.org/pdf/2403.04…null
2024-03-07FriendNet: Detection-Friendly Dehazing NetworkFriendNet:检测友好的去雾网络Yihua Fan, Yongzhen Wang, Mingqiang Wei, Fu Lee Wang, Haoran Xiearxiv.org/pdf/2403.04…null
2024-03-07Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning深度学习中颜色和纹理失真对地球观测数据的影响Martin Willbo, Aleksis Pirinen, John Martinsson, Edvin Listo Zec, Olof Mogren, Mikael Nilssonarxiv.org/pdf/2403.04…null
2024-03-07Learning to Remove Wrinkled Transparent Film with Polarized Prior学习用偏光先验去除起皱的透明薄膜Jiaqi Tang, Ruizheng Wu, Xiaogang Xu, Sixing Hu, Ying-Cong Chenarxiv.org/pdf/2403.04…null
2024-03-07Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-Dimensional Images for EEG Classification以二维图像表示的适当拓扑图上的时空池用于脑电图分类Takuto Fukushima, Ryusuke Miyamotoarxiv.org/pdf/2403.04…null
2024-03-07AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items DetectionAO-DETR:用于 X 射线违禁物品检测的防重叠 DETRMingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Shuyang Lin, Da Cai, Dongyue Chenarxiv.org/pdf/2403.04…null
2024-03-07A![^{3}]()lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIPA![^{3}]()lign-DFER:开创性的全面动态情感对齐,用于使用 CLIP 进行动态面部表情识别Zeng Tao, Yan Wang, Junxiong Lin, Haoran Wang, Xinji Mai, Jiawen Yu, Xuan Tong, Ziheng Zhou, Shaoqi Yan, Qing Zhao, et.al.arxiv.org/pdf/2403.04…null
2024-03-07Active Generalized Category Discovery主动广义类别发现Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, Cheng-Lin Liuarxiv.org/pdf/2403.04…null
2024-03-07Depth-aware Test-Time Training for Zero-shot Video Object Segmentation零镜头视频对象分割的深度感知测试时间训练Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cunarxiv.org/pdf/2403.04…null
2024-03-07ACC-ViT : Atrous Convolution's Comeback in Vision TransformersACC-ViT:Atrous Convolution 在 Vision Transformers 中回归Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kiharaarxiv.org/pdf/2403.04…null
2024-03-07CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view ImagesCN-RMA:结合网络与光线行进聚合进行多视图图像的 3D 室内物体检测Guanlin Shen, Jingwei Huang, Zhihua Hu, Bin Wangarxiv.org/pdf/2403.04…null
2024-03-07SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt DenoisingSAM-PD:SAM 在通过提示去噪跟踪和分割视频中的任何内容方面可以帮助我们走多远Tao Zhou, Wenhan Luo, Qi Ye, Zhiguo Shi, Jiming Chenarxiv.org/pdf/2403.04…null
2024-03-07Image Coding for Machines with Edge Information Learning Using Segment Anything使用 Segment Anything 进行边缘信息学习的机器图像编码Takahiro Shindo, Kein Yamada, Taiju Watanabe, Hiroshi Watanabearxiv.org/pdf/2403.04…null
2024-03-07SDPL: Shifting-Dense Partition Learning for UAV-View Geo-LocalizationSDPL:无人机视图地理定位的平移密集分区学习Quan Chen, Tingyu Wang, Zihao Yang, Haoran Li, Rongfeng Lu, Yaoqi Sun, Bolun Zheng, Chenggang Yanarxiv.org/pdf/2403.04…null
2024-03-07ProMISe: Promptable Medical Image Segmentation using SAMProMISe:使用 SAM 进行快速医学图像分割Jinfeng Wang, Sifan Song, Xinkun Wang, Yiyi Wang, Yiyi Miao, Jionglong Su, S. Kevin Zhouarxiv.org/pdf/2403.04…null
2024-03-07Dual-path Frequency Discriminators for Few-shot Anomaly Detection用于小样本异常检测的双路径鉴频器Yuhu Bai, Jiangning Zhang, Yuhang Dong, Guanzhong Tian, Yunkang Cao, Yabiao Wang, Chengjie Wangarxiv.org/pdf/2403.04…null
2024-03-07An Explainable AI Framework for Artificial Intelligence of Medical Things医疗人工智能的可解释人工智能框架Al Amin, Kamrul Hasan, Saleh Zein-Sabatto, Deo Chimba, Imtiaz Ahmed, Tariqul Islamarxiv.org/pdf/2403.04…null
2024-03-07Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models可扩展且鲁棒的 Transformer 解码器,用于使用基础模型进行可解释的图像分类Evelyn Mannix, Howard Bondellarxiv.org/pdf/2403.04…null

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07How Far Are We from Intelligent Visual Deductive Reasoning?我们离智能视觉演绎推理还有多远?Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitlyarxiv.org/pdf/2403.04…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed高效的 LoFTR:具有稀疏速度的半密集局部特征匹配Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, Xiaowei Zhouarxiv.org/pdf/2403.04…null
2024-03-07Masked Capsule Autoencoders屏蔽胶囊自动编码器Miles Everett, Mingjun Zhong, Georgios Leontidisarxiv.org/pdf/2403.04…null
2024-03-07Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level更快的邻域注意力:在线程块级别降低自注意力的 O(n^2) 成本Ali Hassani, Wen-Mei Hwu, Humphrey Shiarxiv.org/pdf/2403.04…null
2024-03-07Dynamic Cross Attention for Audio-Visual Person Verification用于视听人员验证的动态交叉注意力R. Gnana Praveen, Jahangir Alamarxiv.org/pdf/2403.04…null
2024-03-07Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention基于联合交叉注意力递归融合的视听行人验证R. Gnana Praveen, Jahangir Alamarxiv.org/pdf/2403.04…null
2024-03-07Multi-step Temporal Modeling for UAV Tracking无人机跟踪的多步时间建模Xiaoying Yuan, Tingfa Xu, Xincong Liu, Ying Wang, Haolin Qin, Yuqiang Fang, Jianan Liarxiv.org/pdf/2403.04…null
2024-03-07LORS: Low-rank Residual Structure for Parameter-Efficient Network StackingLORS:用于参数高效网络堆叠的低秩残差结构Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, Yong Liu, Chengjie Wangarxiv.org/pdf/2403.04…null
2024-03-07A data-centric approach to class-specific bias in image data augmentation一种以数据为中心的方法来解决图像数据增强中特定类别的偏差Athanasios Angelakis, Andrey Rassarxiv.org/pdf/2403.04…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07Unbiased Estimator for Distorted Conics in Camera Calibration相机校准中畸变二次曲线的无偏估计器Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon, Jongwoo Lim, Ayoung Kimarxiv.org/pdf/2403.04…null
2024-03-07Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation用于自我中心 3D 手势估计的单视图到双视图自适应Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Satoarxiv.org/pdf/2403.04…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-07I Can't Believe It's Not Scene Flow!我不敢相信这不是场景流!Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Haysarxiv.org/pdf/2403.04…null
2024-03-07Embodied Understanding of Driving Scenarios对驾驶场景的具体理解Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Liarxiv.org/pdf/2403.04…null
2024-03-07Hyperspectral unmixing for Raman spectroscopy via physics-constrained autoencoders通过物理约束自动编码器对拉曼光谱进行高光谱分解Dimitar Georgiev, Álvaro Fernández-Galiana, Simon Vilms Pedersen, Georgios Papadopoulos, Ruoxiao Xie, Molly M. Stevens, Mauricio Barahonaarxiv.org/pdf/2403.04…null
2024-03-07MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality AssessmentMAGR:用于持续行动质量评估的流形对齐图正则化Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liangarxiv.org/pdf/2403.04…null
2024-03-07Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging用于多重曝光 HDR 成像的单图像 HDR 重建辅助重影抑制和细节保留网络Huafeng Li, Zhenmei Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yuarxiv.org/pdf/2403.04…null
2024-03-07YYDS: Visible-Infrared Person Re-Identification with Coarse DescriptionsYYDS:具有粗略描述的可见红外人员重新识别Yunhao Du, Zhicheng Zhao, Fei Suarxiv.org/pdf/2403.04…null
2024-03-07Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving迈向基于学习的规划:现实世界自动驾驶的 nuPlan 基准Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, et.al.arxiv.org/pdf/2403.04…null