[UPDATED!] 2024-03-14 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior | SCP-Diff:具有空间分类联合先验的逼真语义图像合成 | Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image | 全息重新照明:从单个图像进行可控体积肖像重新照明 | Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, Vishal M. Patel | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | 3D-VLA:3D 视觉-语言-动作生成世界模型 | Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Generalized Predictive Model for Autonomous Driving | 自动驾驶广义预测模型 | Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, et.al. | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation | Make-Your-3D:快速、一致的主题驱动 3D 内容生成 | Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Score-Guided Diffusion for 3D Human Recovery | 用于 3D 人体恢复的评分引导扩散 | Anastasis Stathopoulos, Ligong Han, Dimitris Metaxas | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Explore In-Context Segmentation via Latent Diffusion Models | 通过潜在扩散模型探索上下文分割 | Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models | MambaTalk:使用选择性状态空间模型进行高效的整体手势合成 | Zunnan Xu, Yukang Lin, Haonan Han, Sicheng Yang, Ronghui Li, Yachao Zhang, Xiu Li | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing | Eta 反演:为基于扩散的真实图像编辑设计最佳 Eta 函数 | Wonjun Kang, Kevin Galim, Hyung Il Koo | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation | 3D-SceneDreamer:文本驱动的 3D 一致场景生成 | Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Mitigating attribute amplification in counterfactual image generation | 减轻反事实图像生成中的属性放大 | Tian Xia, Mélanie Roschewitz, Fabio De Sousa Ribeiro, Charles Jones, Ben Glocker | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction | 减轻稀疏视图 CT 重建级联扩散模型中数据一致性引起的差异 | Hanyu Chen, Zhixiu Hao, Lin Guo, Liying Xiao | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images | StainFuser:控制扩散以在数千兆像素组织学图像中实现更快的神经风格转移 | Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang Vu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | XReal: Realistic Anatomy and Pathology-Aware X-ray Generation via Controllable Diffusion Model | XReal:通过可控扩散模型生成逼真的解剖学和病理学 X 射线 | Anees Ur Rehman Hashmi, Ibrahim Almakky, Mohammad Areeb Qazi, Santosh Sanjeev, Vijay Ram Papineni, Dwarikanath Mahapatra, Mohammad Yaqub | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Noise Dimension of GAN: An Image Compression Perspective | GAN 的噪声维度:图像压缩视角 | Ziran Zhu, Tongda Xu, Ling Li, Yan Wang | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Intention-driven Ego-to-Exo Video Generation | 意图驱动的 Ego-to-Exo 视频生成 | Hongchen Luo, Kai Zhu, Wei Zhai, Yang Cao | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Intention-aware Denoising Diffusion Model for Trajectory Prediction | 用于轨迹预测的意图感知去噪扩散模型 | Chen Liu, Shibo He, Haoyu Liu, Jiming Chen | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts | 开关扩散变压器:与稀疏专家混合协同去噪任务 | Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior | Sculpt3D:使用稀疏 3D 先验的多视图一致文本到 3D 生成 | Cheng Chen, Xiaofeng Yang, Fan Yang, Chengzeng Feng, Zhoujie Fu, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Rethinking Referring Object Removal | 重新思考引用对象删除 | Xiangtian Xue, Jiasong Wu, Youyong Kong, Lotfi Senhadji, Huazhong Shu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Virtual birefringence imaging and histological staining of amyloid deposits in label-free tissue using autofluorescence microscopy and deep learning | 使用自发荧光显微镜和深度学习对无标记组织中的淀粉样沉积物进行虚拟双折射成像和组织学染色 | Xilin Yang, Bijie Bai, Yijie Zhang, Musa Aydin, Sahan Yoruc Selcuk, Zhen Guo, Gregory A. Fishbein, Karine Atlan, William Dean Wallace, Nir Pillar, et.al. | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Dyadic Interaction Modeling for Social Behavior Generation | 社会行为生成的二元交互建模 | Minh Tran, Di Chang, Maksim Siniukov, Mohammad Soleymani | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control | StreamMultiDiffusion:具有基于区域的语义控制的实时交互生成 | Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee | arxiv.org/pdf/2403.09… | link |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | MM1:多模式法学硕士预培训的方法、分析和见解 | Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, et.al. | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | 闭上眼睛,注意安全:通过图像到文本转换保护多模式法学硕士 | Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Less is More: Data Value Estimation for Visual Instruction Tuning | 少即是多:视觉指令调整的数据价值估计 | Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding | VisionGPT-3D:用于增强 3D 视觉理解的通用多模态代理 | Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment | M&M:在认知负荷评估中集成视听线索的多模式多任务模型 | Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarro | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments | OpenGraph:大规模户外环境中的开放词汇分层 3D 图表示 | Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Unsupervised Modality-Transferable Video Highlight Detection with Representation Activation Sequence Learning | 具有表示激活序列学习的无监督模态可转移视频精彩片段检测 | Tingtian Li, Zixun Sun, Xinyu Xiao | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks | 将路由功能引入具有低阶瓶颈的视觉语言参数高效微调 | Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions | AVIBench:评估对抗性视觉指令的大视觉语言模型的鲁棒性 | Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring | Griffon v2:通过高分辨率缩放和视觉语言共同参考推进多模态感知 | Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wang | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection | EfficientMFD:迈向更高效的多模态同步融合检测 | Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Geng Yang, Wenbo Huang, Yunsong Li | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival | 选择器:具有卷积屏蔽自动编码器的异构图网络,用于癌症生存的多模式稳健预测 | Liangrui Pan, Yijun Peng, Yan Li, Xiang Wang, Wenjuan Liu, Liwen Xu, Qingchun Liang, Shaoliang Peng | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering | 使用 OCR 模态扰动进行场景文本视觉问答的对抗训练 | Zhixuan Shen, Haonan Luo, Sijia Li, Tianrui Li | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest | PoIFusion:通过兴趣点融合进行多模态 3D 物体检测 | Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Are Vision Language Models Texture or Shape Biased and Can We Steer Them? | 视觉语言模型的纹理或形状有偏差吗?我们可以引导它们吗? | Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca Lamm, Muhammad Jehanzeb Mirza, Margret Keuper, Janis Keuper | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | UniCode: Learning a Unified Codebook for Multimodal Large Language Models | UniCode:学习多模态大语言模型的统一码本 | Sipeng Zheng, Bohan Zhou, Yicheng Feng, Ye Wang, Zongqing Lu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery | 用于 3D 人体网格恢复的分布和深度感知变压器 | Jerrin Bright, Bavesh Balaji, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? | 首先知道:代币分布如何揭示大型视觉语言模型中的隐藏知识? | Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework | VisionGPT:使用通用多模态框架的视觉语言理解代理 | Chris Kelly, Luhui Hu, Bang Yang, Yu Tian, Deshun Yang, Cindy Yang, Zaoshan Huang, Zihao Li, Jiayin Hu, Yuexian Zou | arxiv.org/pdf/2403.09… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping | GaussianGrasper:用于开放词汇机器人抓取的 3D 语言高斯泼溅 | Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, et.al. | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | The NeRFect Match: Exploring NeRF Features for Visual Localization | NeRFect 匹配:探索视觉定位的 NeRF 功能 | Qunjie Zhou, Maxim Maximov, Or Litany, Laura Leal-Taixé | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes | RoDUS:城市场景中静态和动态元素的鲁棒分解 | Thang-Anh-Quan Nguyen, Luis Roldão, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors | PreSight:利用城市规模的 NeRF 先验增强自动驾驶汽车感知 | Tianyuan Yuan, Yucheng Mao, Jiawei Yang, Yicheng Liu, Yue Wang, Hang Zhao | arxiv.org/pdf/2403.09… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting | 放宽 3D 高斯分布的精确初始化约束 | Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim | arxiv.org/pdf/2403.09… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams | SpikeReveal:使用 Spike Streams 从真实模糊输入中解锁时间序列 | Kang Chen, Shiyan Chen, Jiyuan Zhang, Baoyue Zhang, Yajing Zheng, Tiejun Huang, Zhaofei Yu | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization | 具有元提示表示和实例对比优化的开放词汇对象检测 | Zhao Wang, Aoxue Li, Fengwei Zhou, Zhenguo Li, Qi Dou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Video Editing via Factorized Diffusion Distillation | 通过因式分解扩散蒸馏进行视频编辑 | Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | YOLOX-ViT 中用于侧扫声纳目标检测的知识蒸馏 | Martin Aubard, László Antal, Ana Madureira, Erika Ábrahám | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models | 选择和提炼:选择性双师知识转移,用于视觉语言模型的持续学习 | Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration | SAM-Lightening:轻量级Segment Anything模型,具有Dilated Flash Attention,可实现30倍加速 | Yanfei Songa, Bangzheng Pua, Peng Wanga, Hongxu Jiang, Dong Donga, Yiqing Shen | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Generalized Relevance Learning Grassmann Quantization | 广义相关学习格拉斯曼量化 | M. Mohammadi, M. Babai, M. H. F. Wilkinson | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition | VDNA-PR:使用通用数据集表示进行鲁棒顺序视觉位置识别 | Benjamin Ramtoula, Daniele De Martini, Matthew Gadd, Paul Newman | arxiv.org/pdf/2403.09… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding | GroupContrast:用于 3D 理解的语义感知自监督表示学习 | Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | Transformers 变得稳定:语言模型的端到端信号传播理论 | Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia, Jungho Jung, Harshith Goka, Haejun Lee | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | OneTracker:将视觉对象跟踪与基础模型和高效调整相结合 | Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, et.al. | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | PosSAM: Panoptic Open-vocabulary Segment Anything | PosSAM:全景开放词汇分段任何内容 | Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikli | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Renovating Names in Open-Vocabulary Segmentation Benchmarks | 更新开放词汇分割基准中的名称 | Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Cloud gap-filling with deep learning for improved grassland monitoring | 通过深度学习填补云空白以改善草原监测 | Iason Tsardanidis, Alkiviadis Koukos, Vasileios Sitokonstantinou, Thanassis Drivas, Charalampos Kontoes | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity | WeakSurg:使用时间等方差和语义连续性的弱监督手术器械分割 | Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Explorations in Texture Learning | 纹理学习的探索 | Blaine Hoak, Patrick McDaniel | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | SkateFormer:用于人类动作识别的骨骼时间转换器 | Jeonghyeok Do, Munchurl Kim | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Don't Judge by the Look: A Motion Coherent Augmentation for Video Recognition | 不要以貌取人:视频识别的运动相干增强 | Yitian Zhang, Yue Bai, Huan Wang, Yizhou Wang, Yun Fu | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Faceptor: A Generalist Model for Face Perception | Faceptor:面部感知的通用模型 | Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Deng | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Anomaly Detection by Adapting a pre-trained Vision Language Model | 通过采用预先训练的视觉语言模型进行异常检测 | Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Bai | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Efficient Transferability Assessment for Selection of Pre-trained Detectors | 用于选择预训练探测器的高效可转移性评估 | Zhao Wang, Aoxue Li, Zhenguo Li, Qi Dou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Region-based U-net for accelerated training and enhanced precision in deep brain segmentation | 基于区域的 U-net,用于加速训练并提高深部大脑分割的精度 | Mengyu Li, Magnus Magnusson, Thilo van Eimeren, Lotta M. Ellingsen | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization | XCoOp:通过概念引导上下文优化进行计算机辅助诊断的可解释即时学习 | Yequan Bie, Luyang Luo, Zhixuan Chen, Hao Chen | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalization | ConDiSR:单域泛化的对比解开和风格正则化 | Aleksandr Matsun, Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqub | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | GiT: Towards Generalist Vision Transformer through Universal Language Interface | GiT:通过通用语言接口迈向通才视觉转换器 | Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Impact of Synthetic Images on Morphing Attack Detection Using a Siamese Network | 合成图像对使用连体网络的变形攻击检测的影响 | Juan Tapia, Christoph Busch | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | DF4LCZ: A SAM-Empowered Data Fusion Framework for Scene-Level Local Climate Zone Classification | DF4LCZ:用于场景级当地气候区分类的 SAM 授权数据融合框架 | Qianqian Wu, Xianping Ma, Jialu Sui, Man-On Pun | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection | D3T:独特的双域教师之字形跨越 RGB 热间隙,用于域自适应目标检测 | Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan Cho, Wonjun Hwang | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | SD-Net:分箱场景中 6D 姿态估计的对称感知关键点预测和域适应 | Ding-Tao Huang, En-Te Lin, Lipeng Chen, Li-Fu Liu, Long Zeng | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Semi- and Weakly-Supervised Learning for Mammogram Mass Segmentation with Limited Annotations | 具有有限注释的乳房X线照片质量分割的半监督和弱监督学习 | Xinyu Xiong, Churan Wang, Wenxue Li, Guanbin Li | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Annotation Free Semantic Segmentation with Vision Foundation Models | 使用视觉基础模型进行无注释语义分割 | Soroush Seifi, Daniel Olmeda Reino, Fabien Despinoy, Rahaf Aljundi | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical Perspective | 从理论角度重新思考用于医疗异常检测的自动编码器 | Yu Cai, Hao Chen, Kwang-Ting Cheng | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Anatomical Structure-Guided Medical Vision-Language Pre-training | 解剖结构引导医学视觉语言预训练 | Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | CLIP-EBC:CLIP通过增强的分块分类可以准确计数 | Yiming Ma, Victor Sanchez, Tanaya Guha | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | EventRPG: Event Data Augmentation with Relevance Propagation Guidance | EventRPG:具有相关性传播指导的事件数据增强 | Mingyuan Sun, Donghao Zhang, Zongyuan Ge, Jiaxu Wang, Jia Li, Zheng Fang, Renjing Xu | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Advanced Tumor Segmentation in Medical Imaging: An Ensemble Approach for BraTS 2023 Adult Glioma and Pediatric Tumor Tasks | 医学影像中的高级肿瘤分割:BraTS 2023 成人胶质瘤和儿童肿瘤任务的整体方法 | Fadillah Maani, Anees Ur Rehman Hashmi, Mariam Aljuboory, Numan Saeed, Ikboljon Sobirov, Mohammad Yaqub | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images | WSI-SAM:用于组织病理学全切片图像的多分辨率分段任意模型 (SAM) | Hong Liu, Haosen Yang, Paul J. van Diest, Josien P. W. Pluim, Mitko Veta | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | D-YOLO a robust framework for object detection in adverse weather conditions | D-YOLO 是恶劣天气条件下目标检测的强大框架 | Zihan Chu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Improving Distant 3D Object Detection Using 2D Box Supervision | 使用 2D 框监督改进远程 3D 物体检测 | Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation | 通过实例分割的即时学习定制分割基础模型 | Hyung-Il Kim, Kimin Yun, Jun-Seok Yun, Yuseok Bae | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | SHAN: Object-Level Privacy Detection via Inference on Scene Heterogeneous Graph | SHAN:通过场景异构图推理进行对象级隐私检测 | Zhuohang Jiang, Bingkui Tong, Xia Du, Ahmed Alhammadi, Jizhe Zhou | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation | VM-UNET-V2 重新思考用于医学图像分割的 Vision Mamba UNet | Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation | 生物物理学为脑肿瘤分割提供病理学正则化 | Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Single Domain Generalization for Crowd Counting | 人群计数的单域泛化 | Zhuoxuan Peng, S. -H. Gary Chan | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Randomized Principal Component Analysis for Hyperspectral Image Classification | 高光谱图像分类的随机主成分分析 | Mustafa Ustuner | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | CardioCaps: Attention-based Capsule Network for Class-Imbalanced Echocardiogram Classification | CardioCaps:基于注意力的胶囊网络,用于类别不平衡的超声心动图分类 | Hyunkyung Han, Jihyeon Seong, Jaesik Choi | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | When Semantic Segmentation Meets Frequency Aliasing | 当语义分割遇到频率混叠时 | Linwei Chen, Lin Gu, Ying Fu | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | TBI Image/Text (TBI-IT): Comprehensive Text and Image Datasets for Traumatic Brain Injury Research | TBI 图像/文本 (TBI-IT):用于创伤性脑损伤研究的综合文本和图像数据集 | Jie Li, Jiaying Wen, Tongxin Yang, Fenglin Cai, Miao Wei, Zhiwei Zhang, Li Jiang | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines | 利用基础模型自动数据增强策略和骨骼点进行工业装配线中的手部动作识别 | Liang Wu, X. -G. Ma | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier | 长尾分类器的梯度感知 Logit 调整损失 | Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong | arxiv.org/pdf/2403.09… | link |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | Perspective-Equivariant Imaging: an Unsupervised Framework for Multispectral Pansharpening | 透视等变成像:多光谱全色锐化的无监督框架 | Andrew Wang, Mike Davies | arxiv.org/pdf/2403.09… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Video Mamba Suite:状态空间模型作为视频理解的多功能替代方案 | Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | LocalMamba: Visual State Space Model with Windowed Selective Scan | LocalMamba:具有窗口选择性扫描的视觉状态空间模型 | Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation | PYRA:并行产出重新激活,用于训练推理高效任务适应 | Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering | S^2MVTC:简单而高效的可扩展多视图张量聚类 | Zhen Long, Qiyuan Wang, Yazhou Ren, Yipeng Liu, Ce Zhu | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Desigen: A Pipeline for Controllable Design Template Generation | Desigen:可控设计模板生成流程 | Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chen | arxiv.org/pdf/2403.09… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering | Glyph-ByT5:用于精确视觉文本渲染的定制文本编码器 | Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | 通过人员匹配和无监督 2D-3D 提升来改进实时全向 3D 多人人体姿势估计 | Pawel Knap, Peter Hardy, Alberto Tamajo, Hwasup Lim, Hansung Kim | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians | 使用弹簧质量 3D 高斯模型重建和模拟弹性物体 | Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph | Hyper-3DG:通过 Hypergraph 生成文本到 3D 高斯 | Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | CLOAF: CoLlisiOn-Aware Human Flow | CLOAF:碰撞感知人流 | Andrey Davydov, Martin Engilberge, Mathieu Salzmann, Pascal Fua | arxiv.org/pdf/2403.09… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | Counterfactual contrastive learning: robust representations via causal image synthesis | 反事实对比学习:通过因果图像合成实现稳健表示 | Melanie Roschewitz, Fabio De Sousa Ribeiro, Tian Xia, Galvin Khara, Ben Glocker | arxiv.org/pdf/2403.09… | link |
| 2024-03-14 | Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure | 哨兵引导的零样本学习:无需真实数据暴露的协作范式 | Fan Wan, Xingyu Miao, Haoran Duan, Jingjing Deng, Rui Gao, Yang Long | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Metadata-Driven Federated Learning of Connectional Brain Templates in Non-IID Multi-Domain Scenarios | 非独立同分布多域场景中元数据驱动的连接脑模板联邦学习 | Geng Chen, Qingyue Wang, Islem Rekik | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Hyperparameters in Continual Learning: a Reality Check | 持续学习中的超参数:现实检验 | Sungmin Cha, Kyunghyun Cho | arxiv.org/pdf/2403.09… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-14 | What Sketch Explainability Really Means for Downstream Tasks | 草图可解释性对于下游任务真正意味着什么 | Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Tao Xiang, Yi-Zhe Song | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation | 通过时间入射光调制进行基于事件的异步 HDR 成像 | Yuliang Wu, Ganchao Tan, Jinze Chen, Wei Zhai, Yang Cao, Zheng-Jun Zha | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | SketchINR: A First Look into Sketches as Implicit Neural Representations | SketchINR:初探作为隐式神经表征的草图 | Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement | 软化防御:通过自我引导标签细化实现对抗稳健性 | Daiwei Yu, Zhuorong Li, Lina Wei, Canghong Jin, Yun Zhang, Sixian Chan | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Deep unfolding Network for Hyperspectral Image Super-Resolution with Automatic Exposure Correction | 具有自动曝光校正功能的高光谱图像超分辨率深度展开网络 | Yuan Fang, Yipeng Liu, Jie Chen, Zhen Long, Ao Li, Chong-Yung Chi, Ce Zhu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains | 驯服异构数据域联合原型学习中的跨域表示方差 | Lei Wang, Jieming Bian, Letian Zhang, Chen Chen, Jie Xu | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours | rFaceNet:通过特定身份的面部轮廓增强生理信号提取的端到端网络 | Dali Zhu, Wenli Zhang, Hualin Zeng, Xiaohao Liu, Long Yang, Jiaqi Zheng | arxiv.org/pdf/2403.09… | null |
| 2024-03-14 | Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset | 使用 WebSight 数据集解锁 Web 屏幕截图到 HTML 代码的转换 | Hugo Laurençon, Léo Tronchon, Victor Sanh | arxiv.org/pdf/2403.09… | null |