[分享][每日更新][2024.01.18][CV_arxiv_papers]

369 阅读13分钟

[UPDATED!] 2024-01-18 (Publish Time)

分类/检测/识别/分割

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18OMG-Seg: Is One Model Good Enough For All Segmentation?OMG-Seg:一种模型足以适用于所有细分吗?Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loyarxiv.org/pdf/2401.10…null
2024-01-18RAP-SAM: Towards Real-Time All-Purpose Segment AnythingRAP-SAM:迈向实时通用分段任何内容Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, et.al.arxiv.org/pdf/2401.10…null
2024-01-18A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting用于全景分割和掩模修复的简单潜在扩散方法Wouter Van Gansbeke, Bert De Brabanderearxiv.org/pdf/2401.10…null
2024-01-18Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions解释隐式神经画布:通过追踪像素的贡献将像素与神经元连接起来Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastavaarxiv.org/pdf/2401.10…null
2024-01-18Comprehensive OOD Detection Improvements全面的 OOD 检测改进Anish Lakkapragada, Amol Khanna, Edward Raff, Nathan Inkawhicharxiv.org/pdf/2401.10…null
2024-01-18Few-shot learning for COVID-19 Chest X-Ray Classification with Imbalanced Data: An Inter vs. Intra Domain Study具有不平衡数据的 COVID-19 胸部 X 射线分类的少样本学习:域间与域内研究Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Antonio Pertusaarxiv.org/pdf/2401.10…null
2024-01-18Exposing Lip-syncing Deepfakes from Mouth Inconsistencies揭露口型不一致的 DeepfakesSoumyya Kanti Datta, Shan Jia, Siwei Lyuarxiv.org/pdf/2401.10…null
2024-01-18VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text RecognitionVIPTR:用于快速高效场景文本识别的视觉可变换提取器Xianfu Cheng, Weixiao Zhou, Xiang Li, Xiaoming Chen, Jian Yang, Tongliang Li, Zhoujun Liarxiv.org/pdf/2401.10…null
2024-01-18ContextMix: A context-aware data augmentation method for industrial visual inspection systemsContextMix:工业视觉检测系统的上下文感知数据增强方法Hyungmin Kim, Donghun Kim, Pyunghwan Ahn, Sungho Suh, Hansang Cho, Junmo Kimarxiv.org/pdf/2401.10…null
2024-01-18Deep spatial context: when attention-based models meet spatial regression深层空间上下文:当基于注意力的模型遇到空间回归时Paulina Tomaszewska, Elżbieta Sienkiewicz, Mai P. Hoang, Przemysław Biecekarxiv.org/pdf/2401.10…null
2024-01-18CMFN: Cross-Modal Fusion Network for Irregular Scene Text RecognitionCMFN:用于不规则场景文本识别的跨模态融合网络Jinzhi Zheng, Ruyi Ji, Libo Zhang, Yanjun Wu, Chen Zhaoarxiv.org/pdf/2401.10…null
2024-01-18GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action RecognitionGPT4Ego:释放预训练模型的潜力,实现零样本自我中心动作识别Guangzhao Dai, Xiangbo Shu, Wenhao Wuarxiv.org/pdf/2401.10…null
2024-01-18Depth Over RGB: Automatic Evaluation of Open Surgery Skills Using Depth CameraDepth Over RGB:使用深度相机自动评估开放手术技能Ido Zuckerman, Nicole Werner, Jonathan Kouchly, Emma Huston, Shannon DiMarco, Paul DiMusto, Shlomi Lauferarxiv.org/pdf/2401.10…null
2024-01-18Text Region Multiple Information Perception Network for Scene Text Detection用于场景文本检测的文本区域多信息感知网络Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhaoarxiv.org/pdf/2401.10…null
2024-01-18BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text DetectionBPDO:任意形状场景文本检测的边界点动态优化Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhaoarxiv.org/pdf/2401.09…null
2024-01-18Developing an AI-based Integrated System for Bee Health Evaluation开发基于人工智能的蜜蜂健康评估综合系统Andrew Liangarxiv.org/pdf/2401.09…null
2024-01-18Ventricular Segmentation: A Brief Comparison of U-Net Derivatives心室分割:U-Net 导数的简要比较Ketan Suhaas Saichandranarxiv.org/pdf/2401.09…null
2024-01-18CustomVideo: Customizing Text-to-Video Generation with Multiple SubjectsCustomVideo:自定义多个主题的文本到视频生成Zhao Wang, Aoxue Li, Enze Xie, Lingting Zhu, Yong Guo, Qi Dou, Zhenguo Liarxiv.org/pdf/2401.09…null
2024-01-18Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking用于运动视觉跟踪的联合重新识别、团队归属和角色分类的多任务学习Amir M. Mansourian, Vladimir Somers, Christophe De Vleeschouwer, Shohreh Kasaeiarxiv.org/pdf/2401.09…null
2024-01-18ICGNet: A Unified Approach for Instance-Centric GraspingICGNet:以实例为中心的抓取的统一方法René Zurbrügg, Yifan Liu, Francis Engelmann, Suryansh Kumar, Marco Hutter, Vaishakh Patil, Fisher Yuarxiv.org/pdf/2401.09…null
2024-01-18MAMBA: Multi-level Aggregation via Memory Bank for Video Object DetectionMAMBA:通过内存库进行多级聚合,用于视频对象检测Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertsonarxiv.org/pdf/2401.09…null
2024-01-18BlenDA: Domain Adaptive Object Detection through diffusion-based blendingBlenDA:通过基于扩散的混合进行域自适应对象检测Tzuhsuan Huang, Chen-Che Huang, Chung-Hao Ku, Jun-Cheng Chenarxiv.org/pdf/2401.09…null
2024-01-18XAI-Enhanced Semantic Segmentation Models for Visual Quality Inspection用于视觉质量检测的 XAI 增强语义分割模型Tobias Clement, Truong Thanh Hung Nguyen, Mohamed Abdelaal, Hung Caoarxiv.org/pdf/2401.09…null
2024-01-18Skeleton-Guided Instance Separation for Fine-Grained Segmentation in Microscopy用于显微镜中细粒度分割的骨架引导实例分离Jun Wang, Chengfeng Zhou, Zhaoyan Ming, Lina Wei, Xudong Jiang, Dahong Qianarxiv.org/pdf/2401.09…null
2024-01-18Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation用于弱监督语义分割的问答跨语言图像匹配Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shenarxiv.org/pdf/2401.09…null
2024-01-18Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and Local Consensus Guided Cross Attention通过实例感知数据增强和局部共识引导交叉注意力来促进少样本分割Li Guo, Haoming Liu, Yuxuan Xia, Chengyu Zhang, Xiaochen Luarxiv.org/pdf/2401.09…null
2024-01-18Improving fine-grained understanding in image-text pre-training提高图像文本预训练的细粒度理解Ioana Bica, Anastasija Ilić, Matthias Bauer, Goker Erdogan, Matko Bošnjak, Christos Kaplanis, Alexey A. Gritsenko, Matthias Minderer, Charles Blundell, Razvan Pascanu, et.al.arxiv.org/pdf/2401.09…null
2024-01-18Enhancing the Fairness and Performance of Edge Cameras with Explainable AI通过可解释的人工智能增强边缘摄像头的公平性和性能Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Quoc Hung Cao, Van Binh Truong, Quoc Khanh Nguyen, Hung Caoarxiv.org/pdf/2401.09…null
2024-01-18Slicer Networks切片器网络Hang Zhang, Xiang Chen, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Liarxiv.org/pdf/2401.09…null
2024-01-18Enhanced Automated Quality Assessment Network for Interactive Building Segmentation in High-Resolution Remote Sensing Imagery用于高分辨率遥感图像中交互式建筑分割的增强型自动化质量评估网络Zhili Zhang, Xiangyun Hu, Jiabo Xuarxiv.org/pdf/2401.09…null
2024-01-18Boosting Few-Shot Semantic Segmentation Via Segment Anything Model通过 Segment Anything 模型促进少样本语义分割Chen-Bin Feng, Qi Lai, Kangdao Liu, Houcheng Su, Chi-Man Vongarxiv.org/pdf/2401.09…null
2024-01-18Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer增强深度神经网络中的小对象编码:引入具有体积点积层的 Fast&Focused-NetAli Tofik, Roy Partha Pratimarxiv.org/pdf/2401.09…null
2024-01-18Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units通过学习离散视觉语音单元,使用单一模型进行多语言视觉语音识别Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Roarxiv.org/pdf/2401.09…null
2024-01-18BreastRegNet: A Deep Learning Framework for Registration of Breast Faxitron and Histopathology ImagesBreastRegNet:用于注册乳房 Faxitron 和组织病理学图像的深度学习框架Negar Golestani, Aihui Wang, Gregory R Bean, Mirabela Rusuarxiv.org/pdf/2401.09…null
2024-01-18Adaptive Self-training Framework for Fine-grained Scene Graph Generation用于细粒度场景图生成的自适应自训练框架Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Parkarxiv.org/pdf/2401.09…null
2024-01-18On the Audio Hallucinations in Large Audio-Video Language Models论大型音视频语言模型中的幻听Taichi Nishimura, Shota Nakada, Masayoshi Kondoarxiv.org/pdf/2401.09…null
2024-01-18SEINE: Structure Encoding and Interaction Network for Nuclei Instance SegmentationSEINE:用于核实例分割的结构编码和交互网络Ye Zhang, Linghan Cai, Ziyue Wang, Yongbing Zhangarxiv.org/pdf/2401.09…null
2024-01-18SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech RecognitionSlideAVSR:用于视听语音识别的论文讲解视频数据集Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawaharaarxiv.org/pdf/2401.09…null
2024-01-18Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation实例布朗桥作为开放词汇视频实例分割的文本Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chenarxiv.org/pdf/2401.09…null
2024-01-18P2Seg: Pointly-supervised Segmentation via Mutual DistillationP2Seg:通过相互蒸馏进行点监督分割Zipeng Wang, Xuehui Yu, Xumeng Han, Wenwen Yu, Zhixun Huang, Jianbin Jiao, Zhenjun Hanarxiv.org/pdf/2401.09…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18Model Compression Techniques in Biometrics Applications: A Survey生物识别应用中的模型压缩技术:调查Eduarda Caldeira, Pedro C. Neto, Marco Huber, Naser Damer, Ana F. Sequeiraarxiv.org/pdf/2401.10…null

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object InteractionsParaHome:参数化日常家庭活动以实现人机交互的 3D 生成模型Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Jooarxiv.org/pdf/2401.10…null
2024-01-18Edit One for All: Interactive Batch Image Editing编辑一应俱全:交互式批量图像编辑Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Leearxiv.org/pdf/2401.10…null
2024-01-18MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature SynchronizerMM-Interleaved:通过多模态特征同步器进行交错图像文本生成建模Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, et.al.arxiv.org/pdf/2401.10…null
2024-01-18Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video GenerationMotion-Zero:用于基于扩散的视频生成的零镜头移动对象控制框架Changgu Chen, Junwei Shu, Lianggangxu Chen, Gaoqi He, Changbo Wang, Yang Liarxiv.org/pdf/2401.10…null
2024-01-18DiffusionGPT: LLM-Driven Text-to-Image Generation SystemDiffusionGPT:法学硕士驱动的文本到图像生成系统Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei Wenarxiv.org/pdf/2401.10…null
2024-01-18Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework探索潜在跨通道嵌入,以在扩散框架中实现准确的 3D 人体姿势重建Junkun Jiang, Jie Chenarxiv.org/pdf/2401.09…null
2024-01-18Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing基于扩散的图像编辑中文本反转的小波引导加速Gwanhyeong Koo, Sunjae Yoon, Chang D. Yooarxiv.org/pdf/2401.09…null
2024-01-18CLIP Model for Images to Textual Prompts Based on Top-k Neighbors基于Top-k邻居的图像到文本提示的CLIP模型Xin Zhang, Xin Zhang, YeMing Cai, Tianzhi Jiaarxiv.org/pdf/2401.09…null
2024-01-18Image Translation as Diffusion Visual Programmers作为扩散视觉程序员的图像翻译Cheng Han, James C. Liang, Qifan Wang, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liuarxiv.org/pdf/2401.09…null
2024-01-18Towards Identifiable Unsupervised Domain Translation: A Diversified Distribution Matching Approach迈向可识别的无监督领域翻译:多样化的分布匹配方法Sagar Shrestha, Xiao Fuarxiv.org/pdf/2401.09…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18Towards Language-Driven Video Inpainting via Multimodal Large Language Models通过多模态大语言模型实现语言驱动的视频修复Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, et.al.arxiv.org/pdf/2401.10…null
2024-01-18CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-IdentificationCPCL:用于弱监督的基于文本的人员重新识别的跨模态原型对比学习Yanwei Zheng, Xinpeng Zhao, Chuanlin Lan, Xiaowei Zhang, Bowen Huang, Jibin Yang, Dongxiao Yuarxiv.org/pdf/2401.10…null
2024-01-18Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation通过显式推理链和视觉问题生成推进大型多模态模型Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, et.al.arxiv.org/pdf/2401.10…null
2024-01-18WorldDreamer: Towards General World Models for Video Generation via Predicting Masked TokensWorldDreamer:通过预测屏蔽令牌实现视频生成的通用世界模型Xiaofeng Wang, Zheng Zhu, Guan Huang, Boyuan Wang, Xinze Chen, Jiwen Luarxiv.org/pdf/2401.09…null
2024-01-18Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models时间洞察力增强:减轻多模态大语言模型中的时间幻觉Li Sun, Liuan Wang, Jun Sun, Takayuki Okataniarxiv.org/pdf/2401.09…null
2024-01-18SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language ModelSkyEyeGPT:通过大型语言模型的指令调整来统一遥感视觉语言任务Yang Zhan, Zhitong Xiong, Yuan Yuanarxiv.org/pdf/2401.09…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18Supervised Fine-tuning in turn Improves Visual Foundation Models有监督的微调反过来改进了视觉基础模型Xiaohu Jiang, Yixiao Ge, Yuying Ge, Chun Yuan, Ying Shanarxiv.org/pdf/2401.10…null
2024-01-18GPAvatar: Generalizable and Precise Head Avatar from Image(s)GPAvatar:来自图像的可概括且精确的头像Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya Haradaarxiv.org/pdf/2401.10…null
2024-01-18VMamba: Visual State Space ModelVMamba:视觉状态空间模型Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Yunfan Liuarxiv.org/pdf/2401.10…null
2024-01-18Explicitly Disentangled Representations in Object-Centric Learning以对象为中心的学习中的显式解缠表示Riccardo Majellaro, Jonathan Collu, Aske Plaat, Thomas M. Moerlandarxiv.org/pdf/2401.10…null
2024-01-18Cross-Modality Perturbation Synergy Attack for Person Re-identification用于人员重新识别的跨模态扰动协同攻击Yunpeng Gong, othersarxiv.org/pdf/2401.10…null
2024-01-18HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain GeneralizationHCVP:利用分层对比视觉提示进行领域泛化Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Tongliang Liu, Lina Yao, Kun Zhangarxiv.org/pdf/2401.09…null

3DGS

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18GaussianBody: Clothed Human Reconstruction via 3d Gaussian SplattingGaussianBody:通过 3d 高斯泼溅重建穿着衣服的人体Mengtian Li, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiangarxiv.org/pdf/2401.09…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wildSHINOBI:通过 BRDF 优化在野外使用神经对象分解的形状和照明Andreas Engelhardt, Amit Raj, Mark Boss, Yunzhi Zhang, Abhishek Kar, Yuanzhen Li, Deqing Sun, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, et.al.arxiv.org/pdf/2401.10…null
2024-01-18Measuring the Discrepancy between 3D Geometric Models using Directional Distance Fields使用定向距离场测量 3D 几何模型之间的差异Siyu Ren, Junhui Hou, Xiaodong Chen, Hongkai Xiong, Wenping Wangarxiv.org/pdf/2401.09…null
2024-01-18fast graph-based denoising for point cloud color information基于图的快速点云颜色信息去噪Ryosuke Watanabe, Keisuke Nonaka, Eduardo Pavez, Tatsuya Kobayashi, Antonio Ortegaarxiv.org/pdf/2401.09…null
2024-01-18Eye Motion Matters for 3D Face Reconstruction眼动对于 3D 面部重建很重要Xuan Wang, Mengyuan Liuarxiv.org/pdf/2401.09…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18Divide and not forget: Ensemble of selectively trained experts in Continual Learning分开但不要忘记:经过选择性培训的持续学习专家团队Grzegorz Rypeść, Sebastian Cygert, Valeriya Khan, Tomasz Trzciński, Bartosz Zieliński, Bartłomiej Twardowskiarxiv.org/pdf/2401.10…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-01-18The Manga Whisperer: Automatically Generating Transcriptions for Comics漫画低语者:自动生成漫画转录Ragav Sachdeva, Andrew Zissermanarxiv.org/pdf/2401.10…null
2024-01-18AutoFT: Robust Fine-Tuning by Optimizing Hyperparameters on OOD DataAutoFT:通过优化 OOD 数据的超参数进行鲁棒微调Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan, Chelsea Finnarxiv.org/pdf/2401.10…null
2024-01-18Neural Echos: Depthwise Convolutional Filters Replicate Biological Receptive Fields神经回声:深度卷积滤波器复制生物感受野Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosuarxiv.org/pdf/2401.10…null
2024-01-18Sub2Full: split spectrum to boost OCT despeckling without clean dataSub2Full:分割光谱以增强 OCT 去斑效果,无需干净数据Lingyun Wang, Jose A Sahel, Shaohua Piarxiv.org/pdf/2401.10…null
2024-01-18Artwork Protection Against Neural Style Transfer Using Locally Adaptive Adversarial Color Attack使用局部自适应对抗性颜色攻击来保护艺术品免受神经风格迁移Zhongliang Guo, Kaixuan Wang, Weiye Li, Yifei Qian, Ognjen Arandjelović, Lei Fangarxiv.org/pdf/2401.09…null