[分享][每日更新][2024.01.22][CV_arxiv_papers]

263 阅读13分钟

[UPDATED!] 2024-01-22 (Publish Time)

分类/检测/识别/分割

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22Exploring Simple Open-Vocabulary Semantic Segmentation探索简单的开放词汇语义分割Zihang Laiarxiv.org/pdf/2401.12…null
2024-01-22Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition连接点:利用时空图神经网络进行准确的孟加拉手语识别Haz Sameen Shahgir, Khondker Salman Sayeed, Md Toki Tahmid, Tanjeem Azwad Zaman, Md. Zarif Ul Alamarxiv.org/pdf/2401.12…null
2024-01-22OK-Robot: What Really Matters in Integrating Open-Knowledge Models for RoboticsOK-Robot:集成机器人开放知识模型真正重要的是什么Peiqi Liu, Yaswanth Orru, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pintoarxiv.org/pdf/2401.12…null
2024-01-22Broiler-Net: A Deep Convolutional Framework for Broiler Behavior Analysis in Poultry HousesBroiler-Net:用于家禽舍中肉鸡行为分析​​的深度卷积框架Tahereh Zarrat Ehsan, Seyed Mehdi Mohtavipourarxiv.org/pdf/2401.12…null
2024-01-22Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNE使用多特征非线性典型相关分析和 t-SNE 对土地覆盖图像进行半监督分割Hong Wei, James Xiao, Yichao Zhang, Xia Hongarxiv.org/pdf/2401.12…null
2024-01-22Automated facial recognition system using deep learning for pain assessment in adults with cerebral palsy使用深度学习的自动面部识别系统对脑瘫成人患者进行疼痛评估Álvaro Sabater-Gárriz, F. Xavier Gaya-Morey, José María Buades-Rubio, Cristina Manresa Yee, Pedro Montoya, Inmaculada Riquelmearxiv.org/pdf/2401.12…null
2024-01-22VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear Responses in VR Stand-up Interactive GamesVRMN-bD:VR 单口互动游戏中沉浸式人类恐惧反应的多模态自然行为数据集He Zhang, Xinyang Li, Yuanxi Sun, Xinyi Fu, Christine Qiu, John M. Carrollarxiv.org/pdf/2401.12…null
2024-01-22Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy具有消融学习温度能量的分布外检测和应用Will LeVine, Benjamin Pikus, Jacob Phillips, Berk Norman, Fernando Amat Gil, Sean Hendryxarxiv.org/pdf/2401.12…null
2024-01-22DeepCERES: A Deep learning method for cerebellar lobule segmentation using ultra-high resolution multimodal MRIDeepCERES:使用超高分辨率多模态 MRI 进行小脑小叶分割的深度学习方法Sergio Morell-Ortega, Marina Ruiz-Perez, Marien Gadea, Roberto Vivo-Hernando, Gregorio Rubio, Fernando Aparici, Mariam de la Iglesia-Vaya, Gwenaelle Catheline, Pierrick Coupé, José V. Manjónarxiv.org/pdf/2401.12…null
2024-01-22CloSe: A 3D Clothing Segmentation Dataset and ModelCloSe:3D 服装分割数据集和模型Dimitrije Antić, Garvita Tiwari, Batuhan Ozcomlekci, Riccardo Marin, Gerard Pons-Mollarxiv.org/pdf/2401.12…null
2024-01-22HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)HomeRobot 开放词汇移动操作挑战赛 2023 参赛者报告(KuzHum 团队)Volodymyr Kuzma, Vladyslav Humennyy, Ruslan Partseyarxiv.org/pdf/2401.12…null
2024-01-22Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling看、听、认:角色感知视听字幕Bruno Korbar, Jaesung Huh, Andrew Zissermanarxiv.org/pdf/2401.12…null
2024-01-22A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network基于显着性增强特征融合的多尺度 RGB-D 显着目标检测网络Rui Huang, Qingyi Zhao, Yan Xing, Sihua Gao, Weifeng Xu, Yuxiang Zhang, Wei Fanarxiv.org/pdf/2401.11…null
2024-01-22Large receptive field strategy and important feature extraction strategy in 3D object detection3D物体检测中的大感受野策略和重要特征提取策略Leichao Cui, Xiuxian Li, Min Mengarxiv.org/pdf/2401.11…null
2024-01-22Evaluating the Feasibility of Standard Facial Expression Recognition in Individuals with Moderate to Severe Intellectual Disabilities评估标准面部表情识别对中度至重度智力障碍个体的可行性F. Xavier Gaya-Morey, Silvia Ramis, Jose M. Buades-Rubio, Cristina Manresa-Yeearxiv.org/pdf/2401.11…null
2024-01-22Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis检测-顺序-构造:一种基于树构造的分层文档结构分析方法Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huoarxiv.org/pdf/2401.11…null
2024-01-22MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentationMOSformer:用于医学图像分割的基于动量编码器的层间融合变压器De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Xiu-Ling Liu, Zeng-Guang Houarxiv.org/pdf/2401.11…null
2024-01-22SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive LearningSignVTCL:通过视觉文本对比学习增强多模式连续手语识别Hao Chen, Jiaze Wang, Ziyu Guo, Jinpeng Li, Donghao Zhou, Bian Wu, Chenyong Guan, Guangyong Chen, Pheng-Ann Hengarxiv.org/pdf/2401.11…null
2024-01-22Unveiling the Human-like Similarities of Automatic Facial Expression Recognition: An Empirical Exploration through Explainable AI揭示自动面部表情识别的类人相似性:通过可解释的人工智能进行实证探索F. Xavier Gaya-Morey, Silvia Ramis-Guarinos, Cristina Manresa-Yee, Jose M. Buades-Rubioarxiv.org/pdf/2401.11…null
2024-01-22Rethinking Centered Kernel Alignment in Knowledge Distillation重新思考知识蒸馏中的中心内核对齐Zikai Zhou, Yunhang Shen, Shitong Shao, Huanran Chen, Linrui Gong, Shaohui Linarxiv.org/pdf/2401.11…null
2024-01-22Symbrain: A large-scale dataset of MRI images for neonatal brain symmetry analysisSymbrain:用于新生儿大脑对称性分析的大规模 MRI 图像数据集Arnaud Gucciardi, Safouane El Ghazouali, Francesca Venturini, Vida Groznik, Umberto Michelucciarxiv.org/pdf/2401.11…null
2024-01-22SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic SegmentationSemPLeS:弱监督语义分割的语义提示学习Ci-Siang Lin, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chenarxiv.org/pdf/2401.11…null
2024-01-22Deep Learning for Computer Vision based Activity Recognition and Fall Detection of the Elderly: a Systematic Review基于计算机视觉的深度学习老年人活动识别和跌倒检测:系统综述F. Xavier Gaya-Morey, Cristina Manresa-Yee, Jose M. Buades-Rubioarxiv.org/pdf/2401.11…null
2024-01-22Collaborative Position Reasoning Network for Referring Image Segmentation用于参考图像分割的协作位置推理网络Jianjian Cao, Beiya Dai, Yulin Li, Xiameng Qin, Jingdong Wangarxiv.org/pdf/2401.11…null
2024-01-22Concealed Object Segmentation with Hierarchical Coherence Modeling使用分层一致性建模的隐藏对象分割Fengyang Xiao, Pan Zhang, Chunming He, Runze Hu, Yutao Liuarxiv.org/pdf/2401.11…null
2024-01-22EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion ModelsEmerDiff:扩散模型中新兴的像素级语义知识Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kimarxiv.org/pdf/2401.11…null
2024-01-22MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic SegmentationMetaSeg:用于全监督语义分割的内容感知元网络Shenwang Jiang, Jianan Li, Ying Wang, Wenxuan Wu, Jizhou Zhang, Bo Huang, Tingfa Xuarxiv.org/pdf/2401.11…null
2024-01-22Colorectal Polyp Segmentation in the Deep Learning Era: A Comprehensive Survey深度学习时代的结直肠息肉分割:综合调查Zhenyu Wu, Fengmao Lv, Chenglizhao Chen, Aimin Hao, Shuo Liarxiv.org/pdf/2401.11…null
2024-01-22Detecting Out-of-Distribution Samples via Conditional Distribution Entropy with Optimal Transport通过具有最佳传输的条件分布熵检测分布外样本Chuanwen Feng, Wenlong Chen, Ao Ke, Yilong Ren, Xike Xie, S. Kevin Zhouarxiv.org/pdf/2401.11…null
2024-01-22Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification使用 TransMix 增强原型网络以实现少样本高光谱图像分类Chun Liu, Longwei Yang, Dongmei Dong, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wangarxiv.org/pdf/2401.11…null
2024-01-22SFC: Shared Feature Calibration in Weakly Supervised Semantic SegmentationSFC:弱监督语义分割中的共享特征校准Xinqiao Zhao, Feilong Tang, Xiaoyang Wang, Jimin Xiaoarxiv.org/pdf/2401.11…null
2024-01-22MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object DetectionMsSVT++:用于 3D 对象检测的具有中心投票功能的混合尺度稀疏体素变换器Jianan Li, Shaocong Dong, Lihe Ding, Tingfa Xuarxiv.org/pdf/2401.11…null
2024-01-22Medical Image Debiasing by Learning Adaptive Agreement from a Biased Council通过从有偏见的委员会学习自适应协议来消除医学图像偏见Luyang Luo, Xin Huang, Minghao Wang, Zhuoyue Wan, Hao Chenarxiv.org/pdf/2401.11…null
2024-01-22EK-Net:Real-time Scene Text Detection with Expand Kernel DistanceEK-Net:扩展核距离的实时场景文本检测Boyuan Zhu, Fagui Liu, Xi Chen, Quan Tangarxiv.org/pdf/2401.11…null
2024-01-22Memory-Efficient Prompt Tuning for Incremental Histopathology Classification用于增量组织病理学分类的内存高效提示调整Yu Zhu, Kang Li, Lequan Yu, Pheng-Ann Hengarxiv.org/pdf/2401.11…null
2024-01-22RTA-Former: Reverse Transformer Attention for Polyp SegmentationRTA-Former:用于息肉分割的反向变压器注意力Zhikai Li, Murong Yi, Ali Uneri, Sihan Niu, Craig Jonesarxiv.org/pdf/2401.11…null
2024-01-22ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action RecognitionActionHub:用于零镜头动作识别的大规模动作视频描述数据集Jiaming Zhou, Junwei Liang, Kun-Yu Lin, Jinrui Yang, Wei-Shi Zhengarxiv.org/pdf/2401.11…null
2024-01-22M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action RecognitionM2-CLIP:视频动作识别的多模态、多任务适应框架Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong Liuarxiv.org/pdf/2401.11…null
2024-01-22Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition跨越时间的朋友:用于手术阶段识别的多尺度动作分段变压器Bokai Zhang, Jiayuan Meng, Bin Cheng, Dean Biskup, Svetlana Petculescu, Angela Chapmanarxiv.org/pdf/2401.11…null
2024-01-22Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal LossZoom-shot:快速高效的无监督零样本将 CLIP 传输到具有多模态损失的视觉编码器Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookesarxiv.org/pdf/2401.11…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22LONEStar: The Lunar Flashlight Optical Navigation ExperimentLONEStar:月球手电筒光学导航实验Michael Krause, Ava Thrasher, Priyal Soni, Liam Smego, Reuben Isaac, Jennifer Nolan, Micah Pledger, E. Glenn Lightsey, W. Jud Ready, John Christianarxiv.org/pdf/2401.12…null
2024-01-22Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency通过多重视差一致性过滤的立体匹配知识蒸馏单目深度估计Woonghyun Ka, Jae Young Lee, Jaehyun Choi, Junmo Kimarxiv.org/pdf/2401.12…null
2024-01-22Robustness to distribution shifts of compressed networks for edge devices边缘设备压缩网络分布变化的鲁棒性Lulan Shen, Ali Edalati, Brett Meyer, Warren Gross, James J. Clarkarxiv.org/pdf/2401.12…null

OCR

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding BenchmarkCMMMU:中国大规模多学科多模态理解基准Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, et.al.arxiv.org/pdf/2401.11…null

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22Single-View 3D Human Digitalization with Large Reconstruction Models具有大型重建模型的单视图 3D 人体数字化Zhenzhen Weng, Jingyuan Liu, Hao Tan, Zhan Xu, Yang Zhou, Serena Yeung-Levy, Jimei Yangarxiv.org/pdf/2401.12…null
2024-01-22Feature Denoising Diffusion Model for Blind Image Quality Assessment用于盲图像质量评估的特征去噪扩散模型Xudong Li, Jingyuan Zheng, Runze Hu, Yan Zhang, Ke Li, Yunhang Shen, Xiawu Zheng, Yutao Liu, ShengChuan Zhang, Pingyang Dai, et.al.arxiv.org/pdf/2401.11…null
2024-01-22A Fair Evaluation of Various Deep Learning-Based Document Image Binarization Approaches对各种基于深度学习的文档图像二值化方法的公平评估Richin Sukesh, Mathias Seuret, Anguelos Nicolaou, Martin Mayr, Vincent Christleinarxiv.org/pdf/2401.11…null
2024-01-22Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs掌握文本到图像的扩散:使用多模态法学硕士进行重述、规划和生成Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cuiarxiv.org/pdf/2401.11…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning CapabilitiesSpatialVLM:赋予视觉语言模型空间推理能力Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xiaarxiv.org/pdf/2401.12…null
2024-01-22Benchmarking Large Multimodal Models against Common Corruptions针对常见腐败对大型多模式模型进行基准测试Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Linarxiv.org/pdf/2401.11…null

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22CheXagent: Towards a Foundation Model for Chest X-Ray InterpretationCheXagent:建立胸部 X 射线解读的基础模型Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Magdalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Cohen, Eduardo Pontes Reis, et.al.arxiv.org/pdf/2401.12…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models越少越好:参数高效的微调推进医学视觉基础模型Chenyu Lian, Hong-Yu Zhou, Yizhou Yu, Liansheng Wangarxiv.org/pdf/2401.12…null
2024-01-22LKFormer: Large Kernel Transformer for Infrared Image Super-ResolutionLKFormer:用于红外图像超分辨率的大型内核变压器Feiwei Qin, Kang Yan, Changmiao Wang, Ruiquan Ge, Yong Peng, Kai Zhangarxiv.org/pdf/2401.11…null
2024-01-22HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View InputsHG3-NeRF:用于稀疏视图输入的分层几何、语义和光度引导神经辐射场Zelin Gao, Weichen Dai, Yu Zhangarxiv.org/pdf/2401.11…null
2024-01-22TIM: An Efficient Temporal Interaction Module for Spiking TransformerTIM:尖峰变压器的高效时间交互模块Sicheng Shen, Dongcheng Zhao, Guobin Shen, Yi Zengarxiv.org/pdf/2401.11…null
2024-01-22MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View StereoMVSFormer++:揭示 Transformer 多视图立体细节中的魔鬼Chenjie Cao, Xinlin Ren, Yanwei Fuarxiv.org/pdf/2401.11…null
2024-01-22OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learningOnDev-LCT:面向联邦学习的设备上轻量级卷积变压器Chu Myaet Thwal, Minh N. H. Nguyen, Ye Lin Tun, Seong Tae Kim, My T. Thai, Choong Seon Hongarxiv.org/pdf/2401.11…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22Scaling Face Interaction Graph Networks to Real World Scenes将人脸交互图网络扩展到现实世界场景Tatiana Lopez-Guevara, Yulia Rubanova, William F. Whitney, Tobias Pfaff, Kimberly Stachenfeld, Kelsey R. Allenarxiv.org/pdf/2401.11…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep通过视差平面扫描对端到端立体匹配网络的立体置信度进行建模Jae Young Lee, Woonghyun Ka, Jaehyun Choi, Junmo Kimarxiv.org/pdf/2401.12…null
2024-01-22Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method观测引导的站级气象场降尺度:基准和新方法Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shiarxiv.org/pdf/2401.11…null
2024-01-22Local Agnostic Video Explanations: a Study on the Applicability of Removal-Based Explanations to Video局部不可知视频解释:基于移除的解释对视频的适用性研究F. Xavier Gaya-Morey, Jose M. Buades-Rubio, Cristina Manresa-Yeearxiv.org/pdf/2401.11…null
2024-01-22Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective图视角的稀疏感知全身运动重建Feiyu Yao, Zongkai Wu, Li Yiarxiv.org/pdf/2401.11…null
2024-01-22PointGL: A Simple Global-Local Framework for Efficient Point Cloud AnalysisPointGL:用于高效点云分析的简单全局局部框架Jianan Li, Jie Wang, Tingfa Xuarxiv.org/pdf/2401.11…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-22Momentum-SAM: Sharpness Aware Minimization without Computational OverheadMomentum-SAM:锐度感知最小化,无需计算开销Marlon Becker, Frederick Altrock, Benjamin Rissearxiv.org/pdf/2401.12…null
2024-01-22A Training-Free Defense Framework for Robust Learned Image Compression用于鲁棒学习图像压缩的免训练防御框架Myungseo Song, Jinyoung Choi, Bohyung Hanarxiv.org/pdf/2401.11…null
2024-01-22Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction多视图遥感数据的自适应融合用于最佳子田作物产量预测Francisco Mena, Deepak Pathak, Hiba Najjar, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Miro Miranda, Jayanth Siddamsetty, Marlon Nuske, et.al.arxiv.org/pdf/2401.11…null
2024-01-22Boosting Multi-view Stereo with Late Cost Aggregation通过后期成本聚合增强多视图立体效果Jiang Wu, Rui Li, Yu Zhu, Wenxun Zhao, Jinqiu Sun, Yanning Zhangarxiv.org/pdf/2401.11…null
2024-01-22Multi-level Cross-modal Alignment for Image Clustering图像聚类的多级跨模态对齐Liping Qiu, Qin Zhang, Xiaojun Chen, Shaotian Caiarxiv.org/pdf/2401.11…null