| 2024-01-18 | OMG-Seg: Is One Model Good Enough For All Segmentation? | OMG-Seg:一种模型足以适用于所有细分吗? | Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | RAP-SAM: Towards Real-Time All-Purpose Segment Anything | RAP-SAM:迈向实时通用分段任何内容 | Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, et.al. | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting | 用于全景分割和掩模修复的简单潜在扩散方法 | Wouter Van Gansbeke, Bert De Brabandere | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions | 解释隐式神经画布:通过追踪像素的贡献将像素与神经元连接起来 | Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Comprehensive OOD Detection Improvements | 全面的 OOD 检测改进 | Anish Lakkapragada, Amol Khanna, Edward Raff, Nathan Inkawhich | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Few-shot learning for COVID-19 Chest X-Ray Classification with Imbalanced Data: An Inter vs. Intra Domain Study | 具有不平衡数据的 COVID-19 胸部 X 射线分类的少样本学习:域间与域内研究 | Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Antonio Pertusa | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Exposing Lip-syncing Deepfakes from Mouth Inconsistencies | 揭露口型不一致的 Deepfakes | Soumyya Kanti Datta, Shan Jia, Siwei Lyu | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition | VIPTR:用于快速高效场景文本识别的视觉可变换提取器 | Xianfu Cheng, Weixiao Zhou, Xiang Li, Xiaoming Chen, Jian Yang, Tongliang Li, Zhoujun Li | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | ContextMix: A context-aware data augmentation method for industrial visual inspection systems | ContextMix:工业视觉检测系统的上下文感知数据增强方法 | Hyungmin Kim, Donghun Kim, Pyunghwan Ahn, Sungho Suh, Hansang Cho, Junmo Kim | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Deep spatial context: when attention-based models meet spatial regression | 深层空间上下文:当基于注意力的模型遇到空间回归时 | Paulina Tomaszewska, Elżbieta Sienkiewicz, Mai P. Hoang, Przemysław Biecek | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition | CMFN:用于不规则场景文本识别的跨模态融合网络 | Jinzhi Zheng, Ruyi Ji, Libo Zhang, Yanjun Wu, Chen Zhao | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition | GPT4Ego:释放预训练模型的潜力,实现零样本自我中心动作识别 | Guangzhao Dai, Xiangbo Shu, Wenhao Wu | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Depth Over RGB: Automatic Evaluation of Open Surgery Skills Using Depth Camera | Depth Over RGB:使用深度相机自动评估开放手术技能 | Ido Zuckerman, Nicole Werner, Jonathan Kouchly, Emma Huston, Shannon DiMarco, Paul DiMusto, Shlomi Laufer | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | Text Region Multiple Information Perception Network for Scene Text Detection | 用于场景文本检测的文本区域多信息感知网络 | Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao | arxiv.org/pdf/2401.10… | null |
| 2024-01-18 | BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection | BPDO:任意形状场景文本检测的边界点动态优化 | Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Developing an AI-based Integrated System for Bee Health Evaluation | 开发基于人工智能的蜜蜂健康评估综合系统 | Andrew Liang | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Ventricular Segmentation: A Brief Comparison of U-Net Derivatives | 心室分割:U-Net 导数的简要比较 | Ketan Suhaas Saichandran | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects | CustomVideo:自定义多个主题的文本到视频生成 | Zhao Wang, Aoxue Li, Enze Xie, Lingting Zhu, Yong Guo, Qi Dou, Zhenguo Li | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking | 用于运动视觉跟踪的联合重新识别、团队归属和角色分类的多任务学习 | Amir M. Mansourian, Vladimir Somers, Christophe De Vleeschouwer, Shohreh Kasaei | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | ICGNet: A Unified Approach for Instance-Centric Grasping | ICGNet:以实例为中心的抓取的统一方法 | René Zurbrügg, Yifan Liu, Francis Engelmann, Suryansh Kumar, Marco Hutter, Vaishakh Patil, Fisher Yu | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection | MAMBA:通过内存库进行多级聚合,用于视频对象检测 | Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | BlenDA: Domain Adaptive Object Detection through diffusion-based blending | BlenDA:通过基于扩散的混合进行域自适应对象检测 | Tzuhsuan Huang, Chen-Che Huang, Chung-Hao Ku, Jun-Cheng Chen | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | XAI-Enhanced Semantic Segmentation Models for Visual Quality Inspection | 用于视觉质量检测的 XAI 增强语义分割模型 | Tobias Clement, Truong Thanh Hung Nguyen, Mohamed Abdelaal, Hung Cao | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Skeleton-Guided Instance Separation for Fine-Grained Segmentation in Microscopy | 用于显微镜中细粒度分割的骨架引导实例分离 | Jun Wang, Chengfeng Zhou, Zhaoyan Ming, Lina Wei, Xudong Jiang, Dahong Qian | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation | 用于弱监督语义分割的问答跨语言图像匹配 | Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and Local Consensus Guided Cross Attention | 通过实例感知数据增强和局部共识引导交叉注意力来促进少样本分割 | Li Guo, Haoming Liu, Yuxuan Xia, Chengyu Zhang, Xiaochen Lu | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Improving fine-grained understanding in image-text pre-training | 提高图像文本预训练的细粒度理解 | Ioana Bica, Anastasija Ilić, Matthias Bauer, Goker Erdogan, Matko Bošnjak, Christos Kaplanis, Alexey A. Gritsenko, Matthias Minderer, Charles Blundell, Razvan Pascanu, et.al. | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Enhancing the Fairness and Performance of Edge Cameras with Explainable AI | 通过可解释的人工智能增强边缘摄像头的公平性和性能 | Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Quoc Hung Cao, Van Binh Truong, Quoc Khanh Nguyen, Hung Cao | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Slicer Networks | 切片器网络 | Hang Zhang, Xiang Chen, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Enhanced Automated Quality Assessment Network for Interactive Building Segmentation in High-Resolution Remote Sensing Imagery | 用于高分辨率遥感图像中交互式建筑分割的增强型自动化质量评估网络 | Zhili Zhang, Xiangyun Hu, Jiabo Xu | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Boosting Few-Shot Semantic Segmentation Via Segment Anything Model | 通过 Segment Anything 模型促进少样本语义分割 | Chen-Bin Feng, Qi Lai, Kangdao Liu, Houcheng Su, Chi-Man Vong | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer | 增强深度神经网络中的小对象编码:引入具有体积点积层的 Fast&Focused-Net | Ali Tofik, Roy Partha Pratim | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units | 通过学习离散视觉语音单元,使用单一模型进行多语言视觉语音识别 | Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | BreastRegNet: A Deep Learning Framework for Registration of Breast Faxitron and Histopathology Images | BreastRegNet:用于注册乳房 Faxitron 和组织病理学图像的深度学习框架 | Negar Golestani, Aihui Wang, Gregory R Bean, Mirabela Rusu | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Adaptive Self-training Framework for Fine-grained Scene Graph Generation | 用于细粒度场景图生成的自适应自训练框架 | Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Park | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | On the Audio Hallucinations in Large Audio-Video Language Models | 论大型音视频语言模型中的幻听 | Taichi Nishimura, Shota Nakada, Masayoshi Kondo | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | SEINE: Structure Encoding and Interaction Network for Nuclei Instance Segmentation | SEINE:用于核实例分割的结构编码和交互网络 | Ye Zhang, Linghan Cai, Ziyue Wang, Yongbing Zhang | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition | SlideAVSR:用于视听语音识别的论文讲解视频数据集 | Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawahara | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation | 实例布朗桥作为开放词汇视频实例分割的文本 | Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen | arxiv.org/pdf/2401.09… | null |
| 2024-01-18 | P2Seg: Pointly-supervised Segmentation via Mutual Distillation | P2Seg:通过相互蒸馏进行点监督分割 | Zipeng Wang, Xuehui Yu, Xumeng Han, Wenwen Yu, Zhixun Huang, Jianbin Jiao, Zhenjun Han | arxiv.org/pdf/2401.09… | null |