[UPDATED!] 2024-03-12 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation | 连接不同语言模型和生成视觉模型以生成文本到图像 | Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | SemCity: Semantic Scene Generation with Triplane Diffusion | SemCity:利用三平面扩散生成语义场景 | Jumin Lee, Sebin Lee, Changho Jo, Woobin Im, Juhyeong Seon, Sung-Eui Yoon | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model | 稳定妆容:当现实世界的妆容转移遇到扩散模型时 | Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, Haibo Zhao | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces | SSM 与视频扩散模型的结合:利用结构化状态空间高效生成视频 | Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, Yutaka Matsuo | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost | 预算注释:利用地理数据相似性来平衡模型性能和注释成本 | Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Multiple Latent Space Mapping for Compressed Dark Image Enhancement | 用于压缩暗图像增强的多重潜在空间映射 | Yi Zeng, Zhengning Wang, Yuxuan Liu, Tianjiao Zeng, Xuhang Liu, Xinglong Luo, Shuaicheng Liu, Shuyuan Zhu, Bing Zeng | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features | 通过集成多分辨率特征进行准确的空间基因表达预测 | Youngmin Chung, Ji Hun Ha, Kyeong Chan Im, Joo Sang Lee | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | The future of document indexing: GPT and Donut revolutionize table of content processing | 文档索引的未来:GPT 和 Donut 彻底改变了内容处理表 | Degaga Wolde Feyisa, Haylemicheal Berihun, Amanuel Zewdu, Mahsa Najimoghadam, Marzieh Zare | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | D4D: An RGBD diffusion model to boost monocular depth estimation | D4D:用于增强单目深度估计的 RGBD 扩散模型 | L. Papa, P. Russo, I. Amerini | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation | 分块 LoRA:重新审视细粒度 LoRA,以实现文本到图像生成中的有效个性化和风格化 | Likun Li, Haoqi Zeng, Changpeng Yang, Haozhe Jia, Di Xu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | DragAnything: Motion Control for Anything using Entity Representation | DragAnything:使用实体表示对任何物体进行运动控制 | Wejia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Auxiliary CycleGAN-guidance for Task-Aware Domain Translation from Duplex to Monoplex IHC Images | 用于任务感知域从双工到单工 IHC 图像转换的辅助 CycleGAN 指南 | Nicolas Brieu, Nicolas Triltsch, Philipp Wortmann, Dominik Winter, Shashank Saran, Marlon Rebelatto, Günter Schmidt | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models | 使用改变的扩散模型变体进行省时且身份一致的虚拟试穿 | Phuong Dam, Jihoon Jeong, Anh Tran, Daeyoung Kim | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning | 挑战遗忘:揭示机器遗忘中最坏情况的遗忘集 | Chongyu Fan, Jiancheng Liu, Alfred Hero, Sijia Liu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning | 预感:使用生成模型来预防持续学习中的未来数据变化 | Mark D. McDonnell, Dong Gong, Ehsan Abbasnejad, Anton van den Hengel | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Vector Quantization for Deep-Learning-Based CSI Feedback in Massive MIMO Systems | 大规模 MIMO 系统中基于深度学习的 CSI 反馈的矢量量化 | Junyong Shin, Yujin Kang, Yo-Seb Jeon | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Large Window-based Mamba UNet for Medical Image Segmentation: Beyond Convolution and Self-attention | 用于医学图像分割的基于大窗口的 Mamba UNet:超越卷积和自注意力 | Jinhong Wang, Jintai Chen, Danny Chen, Jian Wu | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Efficient Diffusion Model for Image Restoration by Residual Shifting | 通过残差移位进行图像恢复的高效扩散模型 | Zongsheng Yue, Jianyi Wang, Chen Change Loy | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Dynamic U-Net: Adaptively Calibrate Features for Abdominal Multi-organ Segmentation | 动态 U-Net:腹部多器官分割的自适应校准特征 | Jin Yang, Daniel S. Marcus, Aristeidis Sotiras | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | A Bayesian Approach to OOD Robustness in Image Classification | 图像分类中 OOD 鲁棒性的贝叶斯方法 | Prakhar Kaushik, Adam Kortylewski, Alan Yuille | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | GuideGen: A Text-guided Framework for Joint CT Volume and Anatomical structure Generation | GuideGen:用于联合 CT 体积和解剖结构生成的文本引导框架 | Linrui Dai, Rongzhao Zhang, Zhongzhen Huang, Xiaofan Zhang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning | 频率感知 Deepfake 检测:通过频率空间学习提高通用性 | Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, Yunchao Wei | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | It's All About Your Sketch: Democratising Sketch Control in Diffusion Models | 一切都与您的草图有关:扩散模型中的草图控制民主化 | Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Learn and Search: An Elegant Technique for Object Lookup using Contrastive Learning | 学习和搜索:使用对比学习进行对象查找的优雅技术 | Chandan Kumar, Jansel Herrera-Gerena, John Just, Matthew Darr, Ali Jannesari | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers | 文本到图像的扩散模型是很棒的素描-照片匹配器 | Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song | arxiv.org/pdf/2403.07… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Beyond Text: Frozen Large Language Models in Visual Signal Comprehension | 超越文本:视觉信号理解中冻结的大型语言模型 | Lei Zhu, Fangyun Wei, Yanye Lu | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Multi-modal Auto-regressive Modeling via Visual Words | 通过视觉词进行多模态自回归建模 | Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping Wang, Bo Du | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Unleashing Network Potentials for Semantic Scene Completion | 释放网络潜力以完成语义场景 | Fengyun Wang, Qianru Sun, Dong Zhang, Jinhui Tang | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | DALSA: Domain Adaptation for Supervised Learning From Sparsely Annotated MR Images | DALSA:从稀疏注释的 MR 图像中进行监督学习的领域适应 | Michael Götz, Christian Weber, Franciszek Binczyk, Joanna Polanska, Rafal Tarnawski, Barbara Bobek-Billewicz, Ullrich Köthe, Jens Kleesiek, Bram Stieltjes, Klaus H. Maier-Hein | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow | 将事件引入 RGB 和 LiDAR:场景流的分层视觉运动融合 | Hanyu Zhou, Yi Chang, Zhiwei Shi, Luxin Yan | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | In-context learning enables multimodal large language models to classify cancer pathology images | 上下文学习使多模态大语言模型能够对癌症病理图像进行分类 | Dyke Ferber, Georg Wölflein, Isabella C. Wiest, Marta Ligero, Srividhya Sainath, Narmin Ghaffari Laleh, Omar S. M. El Nahhas, Gustav Müller-Franzes, Dirk Jäger, Daniel Truhn, et.al. | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection | 消除 BEV 空间中 LiDAR 相机 3D 物体检测的跨模态冲突 | Jiahui Fu, Chen Gao, Zitian Wang, Lirong Yang, Xiaofei Wang, Beipeng Mu, Si Liu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery | 文本知识很重要:跨模态协同教学促进广义视觉类发现 | Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models | KEBench:大型视觉语言模型知识编辑的基准 | Han Huang, Haitian Zhong, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models | Lumen:释放大型多模态模型以视觉为中心的多功能功能 | Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller | 让讲故事讲生动的故事:富有表现力、流利的多模式讲故事者 | Chuanqi Zang, Jiji Tang, Rongsheng Zhang, Zeng Zhao, Tangjie Lv, Mingtao Pei, Wei Liang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection | SparseLIF:用于 3D 物体检测的高性能稀疏 LiDAR 相机融合 | Hongcheng Zhang, Liu Liang, Pengxin Zeng, Xiao Song, Zhe Wang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations | 校准多模态表示:追求无注释的群体鲁棒性 | Chenyu You, Yifei Min, Weicheng Dai, Jasjeet S. Sekhon, Lawrence Staib, James S. Duncan | arxiv.org/pdf/2403.07… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | SMURF: Continuous Dynamics for Motion-Deblurring Radiance Fields | SMURF:运动去模糊辐射场的连续动力学 | Jungho Lee, Dogyoon Lee, Minhyeok Lee, Donghyung Kim, Sangyoun Lee | arxiv.org/pdf/2403.07… | link |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting | StyleGaussian:使用高斯泼溅进行即时 3D 风格转移 | Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM | SemGauss-SLAM:密集语义高斯泼溅 SLAM | Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang | arxiv.org/pdf/2403.07… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Distilling the Knowledge in Data Pruning | 提炼数据修剪的知识 | Emanuel Ben-Baruch, Adam Botach, Igor Kviatkovsky, Manoj Aggarwal, Gérard Medioni | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric | MoPE-CLIP:使用模块式剪枝误差度量对高效视觉语言模型进行结构化剪枝 | Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Learning Generalizable Feature Fields for Mobile Manipulation | 学习移动操作的可推广特征字段 | Ri-Zhao Qiu, Yafei Hu, Ge Yang, Yuchen Song, Yang Fu, Jianglong Ye, Jiteng Mu, Ruihan Yang, Nikolay Atanasov, Sebastian Scherer, et.al. | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Continual All-in-One Adverse Weather Removal with Knowledge Replay on a Unified Network Structure | 在统一的网络结构上通过知识重放持续进行多合一的恶劣天气消除 | De Cheng, Yanling Ji, Dong Gong, Yan Li, Nannan Wang, Junwei Han, Dingwen Zhang | arxiv.org/pdf/2403.07… | link |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Label Dropout: Improved Deep Learning Echocardiography Segmentation Using Multiple Datasets With Domain Shift and Partial Labelling | 标签丢失:使用具有域转移和部分标签的多个数据集改进深度学习超声心动图分割 | Iman Islam, Esther Puyol-Antón, Bram Ruijsink, Andrew J. Reader, Andrew P. King | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | BraSyn 2023 challenge: Missing MRI synthesis and the effect of different learning objectives | BraSyn 2023 挑战:缺少 MRI 综合以及不同学习目标的影响 | Ivo M. Baltruschat, Parvaneh Janbakhshi, Matthias Lenga | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Vision-based Vehicle Re-identification in Bridge Scenario using Flock Similarity | 使用群体相似度进行桥梁场景中基于视觉的车辆重识别 | Chunfeng Zhang, Ping Wang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception | 释放 HyDRa:混合融合、深度一致性和雷达,实现统一 3D 感知 | Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouichi, Martin Hofmann, Gerhard Rigoll | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Equipping Computational Pathology Systems with Artifact Processing Pipelines: A Showcase for Computation and Performance Trade-offs | 为计算病理学系统配备伪影处理管道:计算和性能权衡的展示 | Neel Kanwal, Farbod Khoraminia, Umay Kiraz, Andres Mosquera-Zamudio, Carlos Monteagudo, Emiel A. M. Janssen, Tahlita C. M. Zuiverloon, Chunmig Rong, Kjersti Engan | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | DSEG-LIME -- Improving Image Explanation by Hierarchical Data-Driven Segmentation | DSEG-LIME——通过分层数据驱动的分割改进图像解释 | Patrick Knab, Sascha Marton, Christian Bartelt | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis | 具有知识意识关注的动态图表示用于组织病理学全幻灯片图像分析 | Jiawen Li, Yuxuan Chen, Hongbo Chu, Qiehe Sun, Tian Guan, Anjia Han, Yonghong He | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Intra-video Positive Pairs in Self-Supervised Learning for Ultrasound | 超声自我监督学习中的视频内正对 | Blake VanBerlo, Alexander Wong, Jesse Hoey, Robert Arntfield | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Fast and Simple Explainability for Point Cloud Networks | 点云网络快速且简单的可解释性 | Meir Yossef Levi, Guy Gilboa | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers | CuVLER:通过详尽的自监督 Transformer 增强无监督对象发现 | Shahaf Arica, Or Rubin, Sapir Gershov, Shlomi Laufer | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework | 分解疾病描述以增强病理学检测:多方面视觉语言匹配框架 | Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation | 狩猎属性:弱监督语义分割的上下文原型感知学习 | Feilong Tang, Zhongxing Xu, Zhaojun Qu, Wei Feng, Xingjian Jiang, Zongyuan Ge | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference | Mondrian:具有压缩打包推理的设备上高性能视频分析 | Changmin Jeon, Seonjun Kim, Juheon Yi, Youngki Lee | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions | MinkUNeXt:使用 3D 稀疏卷积的基于点云的大规模地点识别 | J. J. Cabrera, A. Santo, A. Gil, C. Viegas, L. Payá | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution | PeLK:具有外围卷积的参数高效的大型内核卷积网络 | Honghao Chen, Xiangxiang Chu, Yongjian Ren, Xin Zhao, Kaiqi Huang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | FPT: Fine-grained Prompt Tuning for Parameter and Memory Efficient Fine Tuning in High-resolution Medical Image Classification | FPT:用于高分辨率医学图像分类中参数和内存高效微调的细粒度提示调整 | Yijin Huang, Pujin Cheng, Roger Tam, Xiaoying Tang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | An Active Contour Model Driven By the Hybrid Signed Pressure Function | 混合符号压力函数驱动的主动轮廓模型 | Jing Zhao | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Exploring Challenges in Deep Learning of Single-Station Ground Motion Records | 探索单站地面运动记录深度学习的挑战 | Ümit Mert Çağlar, Baris Yilmaz, Melek Türkmen, Erdem Akagündüz, Salih Tileylioglu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model | RSBuilding:利用基础模型实现通用遥感图像建筑物提取和变化检测 | Mingze Wang, Keyan Chen, Lili Su, Cilin Yan, Sheng Xu, Haotian Zhang, Pengcheng Yuan, Xiaolong Jiang, Baochang Zhang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions | 自动驾驶中视觉转换器的调查:当前趋势和未来方向 | Quoc-Vinh Lai-Dang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Open-World Semantic Segmentation Including Class Similarity | 包括类相似性的开放世界语义分割 | Matteo Sodano, Federico Magistri, Lucas Nunes, Jens Behley, Cyrill Stachniss | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss | 通过伪图像标签和边缘损失进行开放词汇场景文本识别 | Xuhua Ren, Hengcan Shi, Jin Li | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Spatiotemporal Representation Learning for Short and Long Medical Image Time Series | 短和长医学图像时间序列的时空表示学习 | Chengzhi Shen, Martin J. Menten, Hrvoje Bogunović, Ursula Schmidt-Erfurth, Hendrik Scholl, Sobha Sivaprasad, Andrew Lotery, Daniel Rueckert, Paul Hager, Robbie Holland | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | MoAI: Mixture of All Intelligence for Large Language and Vision Models | MoAI:大型语言和视觉模型的所有智能的混合 | Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes | 3D 密集字幕的综合综述:定位和描述 3D 场景中的对象 | Ting Yu, Xiaojun Lin, Shuhui Wang, Weiguo Sheng, Qingming Huang, Jun Yu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Backdoor Attack with Mode Mixture Latent Modification | 模式混合潜在修改的后门攻击 | Hongwei Zhang, Xiaoyin Xu, Dongsheng An, Xianfeng Gu, Min Zhang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection | JSTR:基于事件的移动物体检测的联合时空推理 | Hanyu Zhou, Zhiwei Shi, Hao Dong, Shihan Peng, Yi Chang, Luxin Yan | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Input Data Adaptive Learning (IDAL) for Sub-acute Ischemic Stroke Lesion Segmentation | 用于亚急性缺血性中风病变分割的输入数据自适应学习 (IDAL) | Michael Götz, Christian Weber, Christoph Kolb, Klaus Maier-Hein | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios | 从食堂饭菜到日常膳食:将食物识别推广到更实际的场景 | Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin | BID:无监督时间动作定位预训练的边界内部解码 | Qihang Fang, Chengcheng Tang, Shugao Ma, Yanchao Yang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Customizable Avatars with Dynamic Facial Action Coded Expressions (CADyFACE) for Improved User Engagement | 具有动态面部动作编码表达式 (CADyFACE) 的可定制化身,可提高用户参与度 | Megan A. Witherow, Crystal Butler, Winston J. Shields, Furkan Ilgin, Norou Diawara, Janice Keener, John W. Harrington, Khan M. Iftekharuddin | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Advancements in Continuous Glucose Monitoring: Integrating Deep Learning and ECG Signal | 连续血糖监测的进展:深度学习和心电图信号的集成 | MohammadReza Hosseinzadehketilateh, Banafsheh Adami, Nima Karimian | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Rediscovering BCE Loss for Uniform Classification | 重新发现统一分类的 BCE 损失 | Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, Jinming Duan | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | MENTOR: Multilingual tExt detectioN TOward leaRning by analogy | 导师:通过类比学习进行多语言文本检测 | Hsin-Ju Lin, Tsu-Chun Chung, Ching-Chun Hsiao, Pin-Yu Chen, Wei-Chen Chiu, Ching-Chun Huang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction | 通过两步共形预测的自适应边界框不确定性 | Alexander Timans, Christoph-Nikolas Straehle, Kaspar Sakmann, Eric Nalisnick | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Towards Zero-shot Human-Object Interaction Detection via Vision-Language Integration | 通过视觉语言集成实现零样本人机交互检测 | Weiying Xue, Qi Liu, Qiwei Xiong, Yuxiao Wang, Zhenao Wei, Xiaofen Xing, Xiangmin Xu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | 单目显微镜到 CT 配准,使用砧骨姿势估计进行增强现实人工耳蜗植入手术 | Yike Zhang, Eduardo Davalos, Dingjie Su, Ange Lou, Jack H. Noble | arxiv.org/pdf/2403.07… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | 自动驾驶的单视图和多视图深度自适应融合 | JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | SGE: Structured Light System Based on Gray Code with an Event Camera | SGE:基于格雷码和事件相机的结构光系统 | Xingyu Lu, Lei Sun, Diyang Gu, Zhijie Xu, Kaiwei Wang | arxiv.org/pdf/2403.07… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Synth![^2](): Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | Synth![^2]():通过合成字幕和图像嵌入增强视觉语言模型 | Sahand Sharifzadeh, Christos Kaplanis, Shreya Pathak, Dharshan Kumaran, Anastasija Ilic, Jovana Mitrovic, Charles Blundell, Andrea Banino | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | NavCoT:通过学习解缠推理促进基于 LLM 的视觉和语言导航 | Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang | arxiv.org/pdf/2403.07… | link |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | When Eye-Tracking Meets Machine Learning: A Systematic Review on Applications in Medical Image Analysis | 当眼动追踪遇到机器学习:医学图像分析应用的系统回顾 | Sahar Moradizeyveh, Mehnaz Tabassum, Sidong Liu, Robert Ahadizad Newport, Amin Beheshti, Antonio Di Ieva | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Masked AutoDecoder is Effective Multi-Task Vision Generalist | Masked AutoDecoder 是高效的多任务视觉通才 | Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal | 实践中的真知:视频恶劣天气去除的扩散测试时间适应 | Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero, Yulun Zhang, Jing Qin, Lei Zhu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Smartphone region-wise image indoor localization using deep learning for indoor tourist attraction | 使用深度学习对室内旅游景点进行智能手机区域图像室内定位 | Gabriel Toshio Hirokawa Higa, Rodrigo Stuqui Monzani, Jorge Fernando da Silva Cecatto, Maria Fernanda Balestieri Mariano de Souza, Vanessa Aparecida de Moraes Weber, Hemerson Pistori, Edson Takashi Matsubara | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | LaB-GATr: geometric algebra transformers for large biomedical surface and volume meshes | LaB-GATr:用于大型生物医学表面和体积网格的几何代数转换器 | Julian Suk, Baris Imre, Jelmer M. Wolterink | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | ViT-CoMer:具有卷积多尺度特征交互的视觉变压器,用于密集预测 | Chunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, Yifeng Shi | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution | 通过频率自注意力学习校正误差以实现盲图像超分辨率 | Haochen Sun, Yan Yuan, Lijuan Su, Haotian Shao | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Gabor-guided transformer for single image deraining | 用于单图像去雨的 Gabor 引导变压器 | Sijin He, Guangfeng Lin | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers | IM-Unpack:使用任意低精度整数进行训练和推理 | Zhanpeng Zeng, Karthikeyan Sankaralingam, Vikas Singh | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Learning Hierarchical Color Guidance for Depth Map Super-Resolution | 学习深度图超分辨率的分层颜色指导 | Runmin Cong, Ronghui Sheng, Hao Wu, Yulan Guo, Yunchao Wei, Wangmeng Zuo, Yao Zhao, Sam Kwong | arxiv.org/pdf/2403.07… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation | DexCap:可扩展且便携式的 Mocap 数据收集系统,用于灵巧操作 | Chen Wang, Haochen Shi, Weizhuo Wang, Ruohan Zhang, Li Fei-Fei, C. Karen Liu | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Generative deep learning-enabled ultra-large field-of-view lens-free imaging | 支持生成式深度学习的超大视场无镜头成像 | Ronald B. Liu, Zhe Liu, Max G. A. Wolf, Krishna P. Purohit, Gregor Fritz, Yi Feng, Carsten G. Hansen, Pierre O. Bagnaninchi, Xavier Casadevall i Solvas, Yunjie Yang | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM | Motion Mamba:利用分层和双向选择性 SSM 生成高效、长序列的运动 | Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | FSC: Few-point Shape Completion | FSC:少点形状完成 | Xianzu Wu, Xianfeng Wu, Tianyu Luan, Yajing Bai, Zhongyuan Lai, Junsong Yuan | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture | 通过多级同构架构进行频率解耦以实现运动放大 | Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction | 补充事件流和 RGB 帧以进行手部网格重建 | Jianping Jiang, Xinyu Zhou, Bingxuan Wang, Xiaoming Deng, Chao Xu, Boxin Shi | arxiv.org/pdf/2403.07… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | 12 mJ per Class On-Device Online Few-Shot Class-Incremental Learning | 每节课 12 mJ 设备上在线少样本课程 - 增量学习 | Yoga Esa Wibowo, Cristian Cioflan, Thorir Mar Ingolfsson, Michael Hersche, Leo Zhao, Abbas Rahimi, Luca Benini | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | A Fourier Transform Framework for Domain Adaptation | 用于域适应的傅立叶变换框架 | Le Luo, Bingrong Xu, Qingyong Zhang, Cheng Lian, Jie Luo | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Uncertainty-guided Contrastive Learning for Single Source Domain Generalisation | 用于单源域泛化的不确定性引导对比学习 | Anastasios Arsenos, Dimitrios Kollias, Evangelos Petrongonas, Christos Skliros, Stefanos Kollias | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | NightHaze: Nighttime Image Dehazing via Self-Prior Learning | NightHaze:通过自先学习进行夜间图像去雾 | Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | FeTrIL++: Feature Translation for Exemplar-Free Class-Incremental Learning with Hill-Climbing | FeTrIL++:通过爬山实现无示例类增量学习的特征翻译 | Eduard Hogea, Adrian Popescu, Darian Onchis, Grégoire Petit | arxiv.org/pdf/2403.07… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | 用于 6D 物体姿态估计的深度集成的不确定性量化 | Kira Wursthorn, Markus Hillemann, Markus Ulrich | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Robust Synthetic-to-Real Transfer for Stereo Matching | 用于立体匹配的强大的合成到真实传输 | Jiawei Zhang, Jiahe Li, Lei Huang, Xiaohan Yu, Lin Gu, Jin Zheng, Xiao Bai | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation | 优化负面提示以增强文本到图像生成的美观性和保真度 | Michael Ogezi, Ning Shi | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Unified Source-Free Domain Adaptation | 统一无源域适配 | Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu | arxiv.org/pdf/2403.07… | link |
| 2024-03-12 | AACP: Aesthetics assessment of children's paintings based on self-supervised learning | AACP:基于自我监督学习的儿童绘画美学评估 | Shiqi Jiang, Ning Li, Chen Shi, Liping Guo, Changbo Wang, Chenhui Li | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | 点云的类别无关姿态估计 | Bowen Liu, Wei Liu, Siang Chen, Pengwei Xie, Guijin Wang | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors | 熵不足以适应测试时间:从解开因素的角度来看 | Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, Sungroh Yoon | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | Time-Efficient Light-Field Acquisition Using Coded Aperture and Events | 使用编码孔径和事件进行高效的光场采集 | Shuji Habuchi, Keita Takahashi, Chihiro Tsutake, Toshiaki Fujii, Hajime Nagahara | arxiv.org/pdf/2403.07… | null |
| 2024-03-12 | You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval | 你永远不会独行:用于细粒度图像检索的草图和文本二重奏 | Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song | arxiv.org/pdf/2403.07… | null |