[UPDATED!] 2024-02-16 (Publish Time)
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | PaLM2-VAdapter:逐步对齐的语言模型打造强大的视觉语言适配器 | Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Fusion of Diffusion Weighted MRI and Clinical Data for Predicting Functional Outcome after Acute Ischemic Stroke with Deep Contrastive Learning | 融合弥散加权 MRI 和临床数据,通过深度对比学习预测急性缺血性中风后的功能结果 | Chia-Ling Tsai, Hui-Yun Su, Shen-Feng Sung, Wei-Yang Lin, Ying-Ying Su, Tzu-Hsien Yang, Man-Lin Mai | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Multi-modal preference alignment remedies regression of visual instruction tuning on language model | 多模态偏好对齐补救了语言模型上视觉指令调整的回归 | Shengzhi Li, Rongyu Lin, Shichao Pei | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Control Color: Multimodal Diffusion-based Interactive Image Colorization | 控制颜色:基于多模态扩散的交互式图像着色 | Zhexin Liang, Zhaochen Li, Shangchen Zhou, Chongyi Li, Chen Change Loy | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond | 生成式跨模态检索:在多模态语言模型中记忆图像以供检索及其他使用 | Yongqi Li, Wenjie Wang, Leigang Qu, Liqiang Nie, Wenjie Li, Tat-Seng Chua | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion | BioFusionNet:通过多特征和多模态数据融合对 ER+ 乳腺癌进行基于深度学习的生存风险分层 | Raktim Kumar Mondol, Ewan K. A. Millar, Arcot Sowmya, Erik Meijering | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering | 零镜头视频问答的问题指导视觉描述 | David Romero, Thamar Solorio | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Efficient Multi-task Uncertainties for Joint Semantic Segmentation and Monocular Depth Estimation | 联合语义分割和单目深度估计的高效多任务不确定性 | Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Using Left and Right Brains Together: Towards Vision and Language Planning | 共同使用左右脑:迈向视觉和语言规划 | Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration | 通过多模态数据和辅助任务集成优化皮肤病变分类 | Mahapara Khurshid, Mayank Vatsa, Richa Singh | arxiv.org/pdf/2402.10… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation | Weak-Mamba-UNet:Visual Mamba 使 CNN 和 ViT 更好地用于基于 Scribble 的医学图像分割 | Ziyang Wang, Chao Ma | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | HistoSegCap: Capsules for Weakly-Supervised Semantic Segmentation of Histological Tissue Type in Whole Slide Images | HistoSegCap:用于整个幻灯片图像中组织学组织类型的弱监督语义分割的胶囊 | Mobina Mansoori, Sajjad Shahabodini, Jamshid Abouei, Arash Mohammadi, Konstantinos N. Plataniotis | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Enhancement-Driven Pretraining for Robust Fingerprint Representation Learning | 增强驱动的鲁棒指纹表示学习预训练 | Ekta Gavas, Kaustubh Olpadkar, Anoop Namboodiri | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Training Class-Imbalanced Diffusion Model Via Overlap Optimization | 通过重叠优化训练类不平衡扩散模型 | Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | In-Vivo Hyperspectral Human Brain Image Database for Brain Cancer Detection | 用于脑癌检测的体内高光谱人脑图像数据库 | H. Fabelo, S. Ortega, A. Szolna, D. Bulters, J. F. Pineiro, S. Kabwama, A. Shanahan, H. Bulstrode, S. Bisshopp, B. R. Kiran, et.al. | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | STF: Spatio-Temporal Fusion Module for Improving Video Object Detection | STF:用于改进视频对象检测的时空融合模块 | Noreen Anwar, Guillaume-Alexandre Bilodeau, Wassim Bouachir | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Semi-weakly-supervised neural network training for medical image registration | 用于医学图像配准的半弱监督神经网络训练 | Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, et.al. | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift | 使用事后置信度估计的语义分割选择性预测及其在分布偏移下的性能 | Bruno Laboissiere Camargos Borges, Bruno Machado Pacheco, Danilo Silva | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Compact and De-biased Negative Instance Embedding for Multi-Instance Learning on Whole-Slide Image Classification | 用于全幻灯片图像分类多实例学习的紧凑且去偏的负实例嵌入 | Joohyung Lee, Heejeong Nam, Kwanhyung Lee, Sangchul Hahn | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Real-Time Model-Based Quantitative Ultrasound and Radar | 基于实时模型的定量超声和雷达 | Tom Sharon, Yonina C. Eldar | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes | CodaMal:低成本显微镜中疟疾检测的对比域适应 | Ishan Rajendrakumar Dave, Tristan de Blegiers, Chen Chen, Mubarak Shah | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition | Spike-EVPR:具有交叉表示聚合的深度尖峰残差网络,用于基于事件的视觉位置识别 | Chenming Hu, Zheng Fang, Kuanxu Hou, Delei Kong, Junjie Jiang, Hao Zhuang, Mingyuan Sun, Xinjie Huang | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification | 用于被遮挡人员重新识别的动态补丁感知丰富变压器 | Xin Zhang, Keren Fu, Qijun Zhao | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | DABS-LS: Deep Atlas-Based Segmentation Using Regional Level Set Self-Supervision | DABS-LS:使用区域水平集自我监督的基于深度图集的分割 | Hannah G. Mason, Jack H. Noble | arxiv.org/pdf/2402.10… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians | GaussianHair:使用光感知高斯模型进行头发建模和渲染 | Haimin Luo, Min Ouyang, Zijun Zhao, Suyi Jiang, Longwen Zhang, Qixuan Zhang, Wei Yang, Lan Xu, Jingyi Yu | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Explaining generative diffusion models via visual analysis for interpretable decision-making process | 通过可视化分析解释生成扩散模型以实现可解释的决策过程 | Ji-Hoon Park, Yeong-Joon Ju, Seong-Whan Lee | arxiv.org/pdf/2402.10… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | 3D 扩散器 Actor:具有 3D 场景表示的策略扩散 | Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | VATr++: Choose Your Words Wisely for Handwritten Text Generation | VATr++:明智地选择单词以生成手写文本 | Bram Vanherle, Vittorio Pippi, Silvia Cascianelli, Nick Michiels, Frank Van Reeth, Rita Cucchiara | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | PointMamba: A Simple State Space Model for Point Cloud Analysis | PointMamba:用于点云分析的简单状态空间模型 | Dingkang Liang, Xin Zhou, Xinyu Wang, Xingkui Zhu, Wei Xu, Zhikang Zou, Xiaoqing Ye, Xiang Bai | arxiv.org/pdf/2402.10… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds | 多模型 3D 配准:在杂乱的点云中查找多个移动物体 | David Jin, Sushrut Karmalkar, Harry Zhang, Luca Carlone | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | PEGASUS: Personalized Generative 3D Avatars with Composable Attributes | PEGASUS:具有可组合属性的个性化生成 3D 化身 | Hyunsoo Cha, Byungjun Kim, Hanbyul Joo | arxiv.org/pdf/2402.10… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | U![^2]()MRPD: Unsupervised undersampled MRI reconstruction by prompting a large latent diffusion model | U![^2]()MRPD:通过促进大的潜在扩散模型进行无监督欠采样 MRI 重建 | Ziqi Gao, S. Kevin Zhou | arxiv.org/pdf/2402.10… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-16 | Universal Prompt Optimizer for Safe Text-to-Image Generation | 用于安全生成文本到图像的通用提示优化器 | Zongyu Wu, Hongcheng Gao, Yueze Wang, Xiang Zhang, Suhang Wang | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Fully Differentiable Lagrangian Convolutional Neural Network for Continuity-Consistent Physics-Informed Precipitation Nowcasting | 用于连续一致物理信息的降水临近预报的完全可微拉格朗日卷积神经网络 | Peter Pavlík, Martin Výboh, Anna Bou Ezzeddine, Viera Rozinajová | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation | 进行廉价的缩放:用于更高分辨率适应的自级联扩散模型 | Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, et.al. | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Theoretical Understanding of Learning from Adversarial Perturbations | 从对抗性扰动中学习的理论理解 | Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Polyhedral Complex Derivation from Piecewise Trilinear Networks | 分段三线性网络的多面体复数推导 | Jin-Hwa Kim | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | ManiFPT: Defining and Analyzing Fingerprints of Generative Models | ManiFPT:定义和分析生成模型的指纹 | Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed | arxiv.org/pdf/2402.10… | null |
| 2024-02-16 | Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) | 用稀疏线性概念嵌入解释 CLIP (SpLiCE) | Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju | arxiv.org/pdf/2402.10… | null |