[UPDATED!] 2024-02-23 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | A Study of Shape Modeling Against Noise | 抗噪声形状建模研究 | Cheng Long, Adrian Barbu | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition | Gen4Gen:用于生成多概念组合的生成数据管道 | Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation | ProTIP:文本到图像扩散模型对抗随机扰动的概率鲁棒性验证 | Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | On normalization-equivariance properties of supervised and unsupervised denoising methods: a survey | 关于监督和无监督去噪方法的归一化等方差性质:一项调查 | Sébastien Herbreteau, Charles Kervrann | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Label-efficient Multi-organ Segmentation Method with Diffusion Model | 具有扩散模型的标签高效多器官分割方法 | Yongzhi Huang, Jinxin Zhu, Haseeb Hassan, Liyilei Su, Jingyu Li, Binding Huang | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Modified CycleGAN for the synthesization of samples for wheat head segmentation | 修改后的 CycleGAN 用于合成小麦头部分割的样本 | Jaden Myers, Keyhan Najafian, Farhad Maleki, Katie Ovens | arxiv.org/pdf/2402.15… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation | RoboEXP:通过机器人操作的交互式探索的动作条件场景图 | Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries | Text2Pic Swift:增强大型图书馆的长文本到图像检索 | Zijun Long, Xuri Ge, Richard Mccreadie, Joemon Jose | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Large Multimodal Agents: A Survey | 大型多式联运代理:调查 | Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Multimodal Transformer With a Low-Computational-Cost Guarantee | 具有低计算成本保证的多模态变压器 | Sungjin Park, Edward Choi | arxiv.org/pdf/2402.15… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | Distilling Adversarial Robustness Using Heterogeneous Teachers | 使用异质教师提炼对抗鲁棒性 | Jieren Deng, Aaron Palmer, Rigel Mahmood, Ethan Rathbun, Jinbo Bi, Kaleel Mahmood, Derek Aguiar | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales | 更大规模的鲁棒且可解释的视觉任务的层次不变性 | Shuren Qi, Yushu Zhang, Chao Wang, Zhihua Xia, Jian Weng, Xiaochun Cao | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones | 用于纳米无人机视觉姿态估计的深度神经网络的优化部署 | Matteo Risso, Francesco Daghero, Beatrice Alessandra Motetti, Daniele Jahier Pagliari, Enrico Macii, Massimo Poncino, Alessio Burrello | arxiv.org/pdf/2402.15… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | Self-Supervised Pre-Training for Table Structure Recognition Transformer | 表结构识别变压器的自监督预训练 | ShengYun Peng, Seongmin Lee, Xiaojing Wang, Rajarajeswari Balasubramaniyan, Duen Horng Chau | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings | 通过调整临床环境中皮肤病状况分布差异来缩小人工智能泛化差距 | Rajeev V. Rikhye, Aaron Loh, Grace Eunhae Hong, Preeti Singh, Margaret Ann Smith, Vijaytha Muralidharan, Doris Wong, Rory Sayres, Michelle Phung, Nicolas Betancourt, et.al. | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Deep Networks Always Grok and Here is Why | 深度网络总是让人摸不着头脑,原因如下 | Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts | 共同监督学习:通过专家的分层混合提高弱到强的泛化能力 | Yuejiang Liu, Alexandre Alahi | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks | 视网膜专题图增强卷积神经网络的鲁棒性 | Jean-Nicolas Jérémie, Emmanuel Daucé, Laurent U Perrinet | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Benchmarking the Robustness of Panoptic Segmentation for Automated Driving | 自动驾驶全景分割鲁棒性基准测试 | Yiting Wang, Haonan Zhao, Daniel Gummadi, Mehrdad Dianati, Kurt Debattista, Valentina Donzella | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Outlier detection by ensembling uncertainty with negative objectness | 通过将不确定性与负客观性结合起来进行异常值检测 | Anja Delić, Matej Grcić, Siniša Šegvić | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks | AutoMMLab:根据计算机视觉任务的语言指令自动生成可部署模型 | Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Low-Rank Representations Meets Deep Unfolding: A Generalized and Interpretable Network for Hyperspectral Anomaly Detection | 低秩表示满足深度展开:用于高光谱异常检测的通用且可解释的网络 | Chenyu Li, Bing Zhang, Danfeng Hong, Jing Yao, Jocelyn Chanussot | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding | OpenSUN3D:第一届开放词汇 3D 场景理解研讨会挑战 | Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, et.al. | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Representing Online Handwriting for Recognition in Large Vision-Language Models | 在大型视觉语言模型中表示在线手写识别 | Anastasiia Fadeeva, Philippe Schlattner, Andrii Maksai, Mark Collier, Efi Kokiopoulou, Jesse Berent, Claudiu Musat | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection | EMIFF:用于车辆-基础设施协作 3D 物体检测的增强型多尺度图像特征融合 | Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | GS-EMA: Integrating Gradient Surgery Exponential Moving Average with Boundary-Aware Contrastive Learning for Enhanced Domain Generalization in Aneurysm Segmentation | GS-EMA:将梯度手术指数移动平均与边界感知对比学习相结合,以增强动脉瘤分割中的域泛化 | Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Nina Cheng, Nishant Ravikumar, Alejandro F. Frangi | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | Unsupervised Domain Adaptation for Brain Vessel Segmentation through Transwarp Contrastive Learning | 通过 Transwarp 对比学习进行脑血管分割的无监督域适应 | Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Kun Wu, Nishant Ravikumar, Alejandro F. Frangi | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | Attention-Guided Masked Autoencoders For Learning Image Representations | 用于学习图像表示的注意力引导掩模自动编码器 | Leon Sick, Dominik Engel, Pedro Hermosilla, Timo Ropinski | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing | 视觉语音与语言的结合:用于高效且上下文感知的视觉语音处理的 VSP-LLM 框架 | Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | PUAD: Frustratingly Simple Method for Robust Anomaly Detection | PUAD:用于稳健异常检测的极其简单的方法 | Shota Sugawara, Ryuji Imamura | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Fiducial Focus Augmentation for Facial Landmark Detection | 用于面部标志检测的基准焦点增强 | Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian | arxiv.org/pdf/2402.15… | null |
OCR
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | DeepSet SimCLR: Self-supervised deep sets for improved pathology representation learning | DeepSet SimCLR:用于改进病理表示学习的自监督深度集 | David Torpey, Richard Klein | arxiv.org/pdf/2402.15… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | State Space Models for Event Cameras | 事件相机的状态空间模型 | Nikola Zubić, Mathias Gehrig, Davide Scaramuzza | arxiv.org/pdf/2402.15… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | MambaIR: A Simple Baseline for Image Restoration with State-Space Model | MambaIR:状态空间模型图像恢复的简单基线 | Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, Shu-Tao Xia | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Seamless Human Motion Composition with Blended Positional Encodings | 具有混合位置编码的无缝人体运动合成 | German Barquero, Sergio Escalera, Cristina Palmero | arxiv.org/pdf/2402.15… | link |
| 2024-02-23 | Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling | 通过逐像素密度分布建模进行半监督计数 | Hui Lin, Zhiheng Ma, Rongrong Ji, Yaowei Wang, Zhou Su, Xiaopeng Hong, Deyu Meng | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Descripción automática de secciones delgadas de rocas: una aplicación Web | 道路自动部分说明:网页应用程序 | Stalyn Paucar, Christian Mejía-Escobar y Víctor Collaguazo | arxiv.org/pdf/2402.15… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving | Cohere3D:利用时间一致性进行基于视觉的自动驾驶的无监督表示学习 | Yichen Xie, Hongge Chen, Gregory P. Meyer, Yong Jae Lee, Eric M. Wolff, Masayoshi Tomizuka, Wei Zhan, Yuning Chai, Xin Huang | arxiv.org/pdf/2402.15… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | CI w/o TN: Context Injection without Task Name for Procedure Planning | CI w/o TN:没有任务名称的上下文注入用于程序规划 | Xinjie Li | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? | 组合参数高效模块是否可以提高少样本传输精度? | Nader Asadi, Mahdi Beitollahi, Yasser Khalil, Yinchuan Li, Guojun Zhang, Xi Chen | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Genie: Generative Interactive Environments | Genie:生成交互环境 | Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et.al. | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Source-Guided Similarity Preservation for Online Person Re-Identification | 用于在线人员重新识别的源引导相似性保留 | Hamza Rami, Jhony H. Giraldo, Nicolas Winckler, Stéphane Lathuilière | arxiv.org/pdf/2402.15… | link |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-23 | Low-Frequency Black-Box Backdoor Attack via Evolutionary Algorithm | 通过进化算法进行低频黑盒后门攻击 | Yanqi Qiao, Dazhuang Liu, Rui Wang, Kaitai Liang | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise | 用于在存在散斑噪声的情况下恢复图像的袋装深度图像先验 | Xi Chen, Zhewen Hou, Christopher A. Metzler, Arian Maleki, Shirin Jalali | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Improving Explainable Object-induced Model through Uncertainty for Automated Vehicles | 通过自动驾驶车辆的不确定性改进可解释的对象诱发模型 | Shihong Ling, Yue Wan, Xiaowei Jia, Na Du | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | CLIPPER+: A Fast Maximal Clique Algorithm for Robust Global Registration | CLIPPER+:用于鲁棒全局注册的快速最大派系算法 | Kaveh Fathian, Tyler Summers | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review | 人口贩运调查中多媒体地理定位的计算机视觉:系统文献综述 | Opeyemi Bamigbade, John Sheppard, Mark Scanlon | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Optimal Transport on the Lie Group of Roto-translations | 旋转平移李群上的最优传输 | Daan Bon, Gautam Pai, Gijs Bellaard, Olga Mula, Remco Duits | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding | 眼见为实:通过 CLIP 引导解码减轻大视觉语言模型中的幻觉 | Ailin Deng, Zhirui Chen, Bryan Hooi | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Font Impression Estimation in the Wild | 野外字体印象估计 | Kazuki Kitajima, Daichi Haraguchi, Seiichi Uchida | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Which Model to Transfer? A Survey on Transferability Estimation | 要转移哪个模型?可迁移性估计调查 | Yuhe Ding, Bo Jiang, Aijing Yu, Aihua Zheng, Jian Liang | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators | BSPA:探索针对图像生成器的黑盒隐形即时攻击 | Yu Tian, Xiao Yang, Yinpeng Dong, Heming Yang, Hang Su, Jun Zhu | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Convergence Analysis of Blurring Mean Shift | 模糊均值漂移的收敛性分析 | Ryoya Yamasaki, Toshiyuki Tanaka | arxiv.org/pdf/2402.15… | null |
| 2024-02-23 | Fine-tuning CLIP Text Encoders with Two-step Paraphrasing | 通过两步释义微调 CLIP 文本编码器 | Hyunjae Kim, Seunghyun Yoon, Trung Bui, Handong Zhao, Quan Tran, Franck Dernoncourt, Jaewoo Kang | arxiv.org/pdf/2402.15… | null |