[UPDATED!] 2024-03-11 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion | BrushNet:一种具有分解双分支扩散的即插即用图像修复模型 | Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Bayesian Diffusion Models for 3D Shape Reconstruction | 用于 3D 形状重建的贝叶斯扩散模型 | Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data | SELMA:学习特定技能的文本到图像专家并将其与自动生成的数据合并 | Jialu Li, Jaemin Cho, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations | DEADiff:一种具有解缠结表示的高效风格化扩散模型 | Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | A Geospatial Approach to Predicting Desert Locust Breeding Grounds in Africa | 预测非洲沙漠蝗虫繁殖地的地理空间方法 | Ibrahim Salihu Yusuf, Mukhtar Opeyemi Yusuf, Kobby Panford-Quainoo, Arnu Pretorius | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting | 通过细粒度图像文本对齐和解剖病理学提示进行医学图像合成 | Wenting Chen, Pengyu Wang, Hui Ren, Lichao Sun, Quanzheng Li, Yixuan Yuan, Xiang Li | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Multistep Consistency Models | 多步一致性模型 | Jonathan Heek, Emiel Hoogeboom, Tim Salimans | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection | 数据无关的算子:用于通用 Deepfake 检测的免训练伪像表示提取器 | Chuangchuang Tan, Ping Liu, RenShuai Tao, Huan Liu, Yao Zhao, Baoyuan Wu, Yunchao Wei | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Distribution-Aware Data Expansion with Diffusion Models | 使用扩散模型进行分布感知数据扩展 | Haowei Zhu, Ling Yang, Jun-Hai Yong, Wentao Zhang, Bin Wang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | V3D: Video Diffusion Models are Effective 3D Generators | V3D:视频扩散模型是有效的 3D 生成器 | Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, Huaping Liu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | 使用强化学习和人类反馈来增强图像标题生成 | Adarsh N L, Arun P V, Aravindh N L | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Distributionally Generative Augmentation for Fair Facial Attribute Classification | 用于公平面部属性分类的分布式生成增强 | Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | 基于 Transformer 的 2D 姿态和时空嵌入融合,用于分心驾驶员动作识别 | Erkut Akdag, Zeqi Zhu, Egor Bondarev, Peter H. N. De With | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | ReStainGAN: Leveraging IHC to IF Stain Domain Translation for in-silico Data Generation | ReStainGAN:利用 IHC 到 IF 染色域转换进行计算机数据生成 | Dominik Winter, Nicolas Triltsch, Philipp Plewa, Marco Rosati, Thomas Padel, Ross Hill, Markus Schick, Nicolas Brieu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Active Generation for Image Classification | 图像分类的主动生成 | Tao Huang, Jiaqi Liu, Shan You, Chang Xu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning | 通过基于策略的强化学习推进文本驱动的胸部 X 射线生成 | Woojung Han, Chanyoung Kim, Dayun Ju, Yumin Shim, Seong Jae Hwang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Incorporating Improved Sinusoidal Threshold-based Semi-supervised Method and Diffusion Models for Osteoporosis Diagnosis | 结合改进的基于正弦阈值的半监督方法和扩散模型进行骨质疏松症诊断 | Wenchi Ke | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | 3D-aware Image Generation and Editing with Multi-modal Conditions | 多模态条件下的 3D 感知图像生成和编辑 | Bo Li, Yi-ke Li, Zhi-fen He, Bin Liu, Yun-Kun Lai | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | From Pixel to Cancer: Cellular Automata in Computed Tomography | 从像素到癌症:计算机断层扫描中的细胞自动机 | Yuxiang Lai, Xiaoxi Chen, Angtian Wang, Alan Yuille, Zongwei Zhou | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation | Text2QR:协调美学定制和扫描鲁棒性以生成文本引导的 QR 码 | Guangyang Wu, Xiaohong Liu, Jun Jia, Xuehao Cui, Guangtao Zhai | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | DivCon: Divide and Conquer for Progressive Text-to-Image Generation | DivCon:分而治之,逐步生成文本到图像 | Yuhao Jia, Wenhan Tan | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | A Segmentation Foundation Model for Diverse-type Tumors | 多种肿瘤的分割基础模型 | Jianhao Xie, Ziang Zhang, Guibo Luo, Yuesheng Zhu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | FSViewFusion: Few-Shots View Generation of Novel Objects | FSViewFusion:新对象的少量视图生成 | Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models | 增强文本到图像合成中的语义保真度:扩散模型中的注意力调节 | Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style | Style2Talker:具有情感风格和艺术风格的高分辨率说话头生成 | Shuai Tan, Bin Ji, Ye Pan | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Say Anything with Any Style | 以任何风格说任何话 | Shuai Tan, Bin Ji, Yu Ding, Ye Pan | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | MOAB: Multi-Modal Outer Arithmetic Block For Fusion Of Histopathological Images And Genetic Data For Brain Tumor Grading | MOAB:多模态外部算术块,用于融合组织病理学图像和遗传数据以进行脑肿瘤分级 | Omnia Alwazzan, Abbas Khan, Ioannis Patras, Gregory Slabaugh | arxiv.org/pdf/2403.06… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | VideoMamba: State Space Model for Efficient Video Understanding | VideoMamba:用于高效视频理解的状态空间模型 | Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks | FocusCLIP:以人为中心的任务中零镜头转移的多模式主题级指导 | Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc Van Gool, Didier Stricker, Muhammad Zeshan Afzal | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | DiaLoc: An Iterative Approach to Embodied Dialog Localization | DiaLoc:一种实现对话本地化的迭代方法 | Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | CT2Rep: Automated Radiology Report Generation for 3D Medical Imaging | CT2Rep:自动生成 3D 医学成像放射学报告 | Ibrahim Ethem Hamamci, Sezgin Er, Bjoern Menze | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | 紧急医疗服务实时多模态认知助手 | Keshara Weerasinghe, Saahith Janapati, Xueren Ge, Sion Kim, Sneha Iyer, John A. Stankovic, Homa Alemzadeh | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning | 通过临床质量强化学习生成大型模型驱动的放射学报告 | Zijian Zhou, Miaojing Shi, Meng Wei, Oluwatosin Alabi, Zijie Yue, Tom Vercauteren | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Restoring Ancient Ideograph: A Multimodal Multitask Neural Network Approach | 恢复古代表意文字:多模态多任务神经网络方法 | Siyu Duan, Jun Wang, Qi Su | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Answering Diverse Questions via Text Attached with Key Audio-Visual Clues | 通过附有关键视听线索的文字回答各种问题 | Qilang Ye, Zitong Yu, Xin Liu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | 3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data | 3DRef:RGB 和激光雷达数据中反射检测的 3D 数据集和基准 | Xiting Zhao, Sören Schwertfeger | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | 3D Semantic Segmentation-Driven Representations for 3D Object Detection | 用于 3D 对象检测的 3D 语义分割驱动表示 | Hayeon O, Kunsoo Huh | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation | 用于多模态测试时间适应的可靠时空体素 | Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Can LLMs' Tuning Methods Work in Medical Multimodal Domain? | 法学硕士的调整方法可以在医学多模式领域发挥作用吗? | Jiawei Chen, Yue Jiang, Dingkang Yang, Mingcheng Li, Jinjie Wei, Ziyun Qian, Lihua Zhang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI | 看透他们的想法:从跨主题功能磁共振成像中学习可迁移的神经表征 | Yulong Liu, Yongqiang Ma, Guibo Zhu, Haodong Jing, Nanning Zheng | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment | 具有对比跨模态特征对齐的多模态语义理解 | Ming Zhang, Ke Chang, Yunfang Wu | arxiv.org/pdf/2403.06… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization | FreGS:具有渐进频率正则化的 3D 高斯分布 | Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection | SiLVR:用于机器人检查的具有神经辐射场的可扩展激光雷达视觉重建 | Yifu Tao, Yash Bhalgat, Lanke Frank Tarimo Fu, Matias Mattamala, Nived Chebrolu, Maurice Fallon | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Vosh: Voxel-Mesh Hybrid Representation for Real-Time View Synthesis | Vosh:用于实时视图合成的体素网格混合表示 | Chenhao Zhang, Yongyang Zhou, Lei Zhang | arxiv.org/pdf/2403.06… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization | DNGaussian:通过全局局部深度归一化优化稀疏视图 3D 高斯辐射场 | Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu | arxiv.org/pdf/2403.06… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | GRITv2: Efficient and Light-weight Social Relation Recognition | GRITv2:高效、轻量级的社交关系识别 | N K Sagar Reddy, Neeraj Kasera, Avinash Thakur | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | 第 2 层之后,一张图像就值 1/2 代币:大型视觉语言模型的即插即用推理加速 | Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor | PeerAiD:由专业同伴导师改进对抗性蒸馏 | Jaewon Jung, Hongsun Jang, Jaeyong Song, Jinho Lee | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning | QuantTune:通过自适应离群值驱动微调来优化模型量化 | Jiun-Man Chen, Yu-Hsuan Chao, Yu-Jie Wang, Ming-Der Shieh, Chih-Chung Hsu, Wei-Fen Lin | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Enhanced Sparsification via Stimulative Training | 通过刺激训练增强稀疏化 | Shengji Tang, Weihao Lin, Hancheng Ye, Peng Ye, Chong Yu, Baopu Li, Tao Chen | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization | FlowVQTalker:通过规范化流和量化生成高质量的情感面部表情 | Shuai Tan, Bin Ji, Ye Pan | arxiv.org/pdf/2403.06… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling | 注意力提示调优:时空建模预训练模型的参数高效适应 | Wele Gedara Chaminda Bandara, Vishal M. Patel | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Explainable Transformer Prototypes for Medical Diagnoses | 用于医学诊断的可解释变压器原型 | Ugur Demir, Debesh Jha, Zheyuan Zhang, Elif Keles, Bradley Allen, Aggelos K. Katsaggelos, Ulas Bagci | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer | 优化手术场景的潜在图表示以实现零样本域转移 | Siddhant Satyanaik, Aditya Murali, Deepak Alapatt, Xin Wang, Pietro Mascagni, Nicolas Padoy | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Advancing Generalizable Remote Physiological Measurement through the Integration of Explicit and Implicit Prior Knowledge | 通过整合显性和隐性先验知识推进可推广的远程生理测量 | Yuting Zhang, Hao Lu, Xin Liu, Yingcong Chen, Kaishun Wu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | 具有高效融合头的基于变压器的实时开放词汇检测 | Tiancheng Zhao, Peng Liu, Xuan He, Lu Zhang, Kyusong Lee | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | COOD: Combined out-of-distribution detection using multiple measures for anomaly & novel class detection in large-scale hierarchical classification | COOD:使用多种措施组合分布外检测,用于大规模分层分类中的异常和新类检测 | L. E. Hogeweg, R. Gangireddy, D. Brunink, V. J. Kalkman, L. Cornelissen, J. W. Kamminga | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | DriveDreamer-2:用于生成多样化驾驶视频的 LLM 增强世界模型 | Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations | LeOCLR:利用原始图像进行视觉表示的对比学习 | Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Deep Learning Approaches for Human Action Recognition in Video Data | 视频数据中人类动作识别的深度学习方法 | Yufei Xie | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification | 医学图像分类的动态扰动自适应对抗训练 | Shuai Li, Xiaoguang Ma, Shancheng Jiang, Lu Meng | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Leveraging Internal Representations of Model for Magnetic Image Classification | 利用模型的内部表示进行磁图像分类 | Adarsh N L, Arun P V, Alok Porwal, Malcolm Aranha | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Genetic Learning for Designing Sim-to-Real Data Augmentations | 用于设计模拟到真实数据增强的遗传学习 | Bram Vanherle, Nick Michiels, Frank Van Reeth | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation | 平均校准误差:提高图像分割可靠性的可微分损失 | Theodore Barfoot, Luis Garcia-Peraza-Herrera, Ben Glocker, Tom Vercauteren | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Shortcut Learning in Medical Image Segmentation | 医学图像分割中的捷径学习 | Manxi Lin, Nina Weng, Kamil Mikolaj, Zahra Bashir, Morten Bo Søndergaard Svendsen, Martin Tolsgaard, Anders Nymark Christensen, Aasa Feragen | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Probabilistic Contrastive Learning for Long-Tailed Visual Recognition | 长尾视觉识别的概率对比学习 | Chaoqun Du, Yulin Wang, Shiji Song, Gao Huang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Advancing Graph Neural Networks with HL-HGAT: A Hodge-Laplacian and Attention Mechanism Approach for Heterogeneous Graph-Structured Data | 使用 HL-HGAT 推进图神经网络:异构图结构数据的 Hodge-Laplacian 和注意力机制方法 | Jinghan Huang, Qiufeng Chen, Yijun Bian, Pengli Zhu, Nanguang Chen, Moo K. Chung, Anqi Qiu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Trustworthy Partial Label Learning with Out-of-distribution Detection | 具有分布外检测的可信部分标签学习 | Jintao Huang, Yiu-Ming Cheung | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective | CAM 再次回归:从弱监督对象定位角度看大型内核 CNN | Shunsuke Yasuki, Masato Taki | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Car Damage Detection and Patch-to-Patch Self-supervised Image Alignment | 汽车损坏检测和逐块自监督图像对齐 | Hanxiao Chen | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | epsilon-Mesh Attack: A Surface-based Adversarial Point Cloud Attack for Facial Expression Recognition | epsilon-Mesh 攻击:基于表面的对抗性点云攻击,用于面部表情识别 | Batuhan Cengiz, Mert Gulsen, Yusuf H. Sahin, Gozde Unal | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Towards Zero-Shot Interpretable Human Recognition: A 2D-3D Registration Framework | 迈向零样本可解释人类识别:2D-3D 配准框架 | Henrique Jesus, Hugo Proença | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Ricci flow-based brain surface covariance descriptors for Alzheimer disease | 基于 Ricci 流的阿尔茨海默病脑表面协方差描述符 | Fatemeh Ahmadi, Mohamad Ebrahim Shiri, Behroz Bidabad, Maral Sedaghat, Pooran Memari | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings | 评估工业环境中目标检测的少样本学习的能源效率 | Georgios Tsoumplekas, Vladislav Li, Ilias Siniosoglou, Vasileios Argyriou, Sotirios K. Goudos, Ioannis D. Moscholios, Panagiotis Radoglou-Grammatikis, Panagiotis Sarigiannidis | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation | 用于航空语义分割和深度估计的森林检查数据集 | Bianca-Cerasela-Zelia Blaga, Sergiu Nedevschi | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Density-Guided Label Smoothing for Temporal Localization of Driving Actions | 用于驾驶行为时间定位的密度引导标签平滑 | Tunc Alkanat, Erkut Akdag, Egor Bondarev, Peter H. N. De With | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers | 图像到图转换器的跨域和跨维度学习 | Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhov, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues | BEV2PR:带有结构提示的 BEV 增强视觉位置识别 | Fudong Ge, Yiwei Zhang, Shuhan Shen, Yue Wang, Weiming Hu, Jin Gao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection | 利用风格潜在流来推广 Deepfake 检测视频检测 | Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Detection of Object Throwing Behavior in Surveillance Videos | 监控视频中物体投掷行为的检测 | Ivo P. C. Kersten, Erkut Akdag, Egor Bondarev, Peter H. N. De With | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation | OMH:通过最佳匹配层次结构实现无监督语义分割的结构化稀疏性 | Baran Ozaydin, Tong Zhang, Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | SARDet-100K:迈向大规模 SAR 物体检测的开源基准和工具包 | Yuxuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis | 通过虚拟深度合成进行置信度感知 RGB-D 人脸识别 | Zijian Chen, Mei Wang, Weihong Deng, Hongzhi Shi, Dongchao Wen, Yingjie Zhang, Xingchen Cui, Jian Zhao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Skeleton Supervised Airway Segmentation | 骨骼监督气道分割 | Mingyue Zhao, Han Li, Li Fan, Shiyuan Liu, Xiaolan Qiu, S. Kevin Zhou | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts | 通过带有少量样本提示的上下文残差学习实现通用异常检测 | Jiawen Zhu, Guansong Pang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Query-guided Prototype Evolution Network for Few-Shot Segmentation | 用于少镜头分割的查询引导原型进化网络 | Runmin Cong, Hang Xiong, Jinpeng Chen, Wei Zhang, Qingming Huang, Yao Zhao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Toward Robust Canine Cardiac Diagnosis: Deep Prototype Alignment Network-Based Few-Shot Segmentation in Veterinary Medicine | 实现稳健的犬心脏诊断:兽医医学中基于深度原型对齐网络的少样本分割 | Jun-Young Oh, In-Gyu Lee, Tae-Eui Kam, Ji-Hoon Jeong | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy | Point Mamba:基于状态空间模型和基于八叉树的排序策略的新型点云主干 | Jiuming Liu, Ruiji Yu, Yian Wang, Yu Zheng, Tianchen Deng, Weicai Ye, Hesheng Wang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation | 走向未知:半监督语义分割的密度下降特征扰动 | Xiaoyang Wang, Huihui Bai, Limin Yu, Yao Zhao, Jimin Xiao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Ensemble Quadratic Assignment Network for Graph Matching | 用于图匹配的集成二次分配网络 | Haoru Tan, Chuang Wang, Sitong Wu, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection | 通过时空虚拟网格进行细粒度支柱特征编码,用于 3D 对象检测 | Konyul Park, Yecheol Kim, Junho Koh, Byungwoo Park, Jun Won Choi | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration | AS-FIBA:自适应选择性频率注入用于深度面部恢复的后门攻击 | Zhenbo Song, Wenhao Gao, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models | PointSeg:通过基础模型进行 3D 场景分割的免训练范式 | Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation | 实时细化分割:点云语义分割的交互式框架 | Peng Zhang, Ting Wu, Jinsheng Sun, Weiqing Li, Zhiyong Su | arxiv.org/pdf/2403.06… | null |
GNN
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | Structure Your Data: Towards Semantic Graph Counterfactuals | 构建数据:走向语义图反事实 | Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Konstantinos Thomas, Giorgos Stamou | arxiv.org/pdf/2403.06… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | Applicability of oculomics for individual risk prediction: Repeatability and robustness of retinal Fractal Dimension using DART and AutoMorph | 眼组学在个体风险预测中的适用性:使用 DART 和 AutoMorph 的视网膜分形维度的重复性和鲁棒性 | Justin Engelmann, Diana Moukaddem, Lucas Gago, Niall Strang, Miguel O. Bernabeu | arxiv.org/pdf/2403.06… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | HDRTransDC: High Dynamic Range Image Reconstruction with Transformer Deformation Convolution | HDRTransDC:使用变压器变形卷积进行高动态范围图像重建 | Shuaikang Shang, Xuejing Kang, Anlong Ming | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Boosting Image Restoration via Priors from Pre-trained Models | 通过预训练模型的先验促进图像恢复 | Xiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learnin | CEAT:非典范类增量学习的持续扩展和吸收变压器 | Xinyuan Gao, Songlin Dong, Yuhang He, Xing Wei, Yihong Gong | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Multi-Scale Implicit Transformer with Re-parameterize for Arbitrary-Scale Super-Resolution | 具有任意尺度超分辨率重新参数化功能的多尺度隐式变压器 | Jinchen Zhu, Mingjian Zhang, Ling Zheng, Shizhuang Weng | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos | 音频驱动的头像视频感知质量指标的比较研究 | Weixia Zhang, Chengguang Zhu, Jingnan Gao, Yichao Yan, Guangtao Zhai, Xiaokang Yang | arxiv.org/pdf/2403.06… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | Memory-based Adapters for Online 3D Scene Perception | 用于在线 3D 场景感知的基于内存的适配器 | Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie Zhou, Jiwen Lu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology | MambaMIL:通过计算病理学中的序列重排序增强长序列建模 | Shu Yang, Yihui Wang, Hao Chen | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation | FaceChain-SuDe:构建派生类以继承类别属性以实现一次性主题驱动生成 | Pengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization | 通过直接跨模式映射和几何正则化快速生成文本到 3D 感知的人脸并进行操作 | Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | PCLD: Point Cloud Layerwise Diffusion for Adversarial Purification | PCLD:用于对抗性净化的点云分层扩散 | Mert Gulsen, Batuhan Cengiz, Yusuf H. Sahin, Gozde Unal | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Ada-Tracker: Soft Tissue Tracking via Inter-Frame and Adaptive-Template Matching | Ada-Tracker:通过帧间和自适应模板匹配进行软组织跟踪 | Jiaxin Guo, Jiangliu Wang, Zhaoshuo Li, Tongyu Jia, Qi Dou, Yun-Hui Liu | arxiv.org/pdf/2403.06… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation | 拆分到合并:统一无监督域适应的分离模式 | Xinyao Li, Yuke Li, Zhekai Du, Fengling Li, Ke Lu, Jingjing Li | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Shape Non-rigid Kinematics (SNK): A Zero-Shot Method for Non-Rigid Shape Matching via Unsupervised Functional Map Regularized Reconstruction | 形状非刚性运动学 (SNK):通过无监督功能图正则化重建实现非刚性形状匹配的零样本方法 | Souhaib Attaiki, Maks Ovsjanikov | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Eliminating Warping Shakes for Unsupervised Online Video Stitching | 消除无监督在线视频拼接的扭曲抖动 | Lang Nie, Chunyu Lin, Kang Liao, Yun Zhang, Shuaicheng Liu, Yao Zhao | arxiv.org/pdf/2403.06… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-11 | Deep adaptative spectral zoom for improved remote heart rate estimation | 深度自适应光谱变焦可改善远程心率估计 | Joaquim Comas, Adria Ruiz, Federico Sukno | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation | 基于视觉的微观仿真交通信号控制的整体框架 | Pan He, Quanyi Li, Xiaoyong Yuan, Bolei Zhou | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Learning with Noisy Foundation Models | 使用嘈杂的基础模型进行学习 | Hao Chen, Jindong Wang, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | QUASAR: QUality and Aesthetics Scoring with Advanced Representations | QUASAR:使用高级表示进行质量和美观评分 | Sergey Kastryulin, Denis Prokopenko, Artem Babenko, Dmitry V. Dylov | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | 来自头戴式传感器的实时模拟头像 | Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Kris Kitani, Weipeng Xu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Stochastic Cortical Self-Reconstruction | 随机皮质自我重建 | Christian Wachinger, Dennis Hedderich, Fabian Bongratz | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | EarthLoc: Astronaut Photography Localization by Indexing Earth from Space | EarthLoc:通过从太空索引地球来进行宇航员摄影定位 | Gabriele Berton, Alex Stoken, Barbara Caputo, Carlo Masone | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency | 将相对单眼深度转换为具有时间一致性的手术视觉 | Charlie Budd, Tom Vercauteren | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology | 利用基础模型进行放射学中基于内容的医学图像检索 | Stefan Denner, David Zimmerer, Dimitrios Bounias, Markus Bujotzek, Shuhan Xiao, Lisa Kausch, Philipp Schader, Tobias Penzkofer, Paul F. Jäger, Klaus Maier-Hein | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Reconstructing Visual Stimulus Images from EEG Signals Based on Deep Visual Representation Model | 基于深度视觉表示模型的脑电信号重建视觉刺激图像 | Hongguang Pan, Zhuoyi Li, Yunpeng Fu, Xuebin Qin, Jianchen Hu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications | FontCLIP:用于多语言字体应用的语义排版视觉语言模型 | Yuki Tatsukawa, I-Chao Shen, Anran Qi, Yuki Koyama, Takeo Igarashi, Ariel Shamir | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Latent Semantic Consensus For Deterministic Geometric Model Fitting | 确定性几何模型拟合的潜在语义共识 | Guobao Xiao, Jun Yu, Jiayi Ma, Deng-Ping Fan, Ling Shao | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Temporal-Mapping Photography for Event Cameras | 用于事件相机的时间映射摄影 | Yuhan Bao, Lei Sun, Yuqin Ma, Kaiwei Wang | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Bridging Domains with Approximately Shared Features | 桥接具有近似共享功能的域 | Ziliang Samuel Zhong, Xiang Pan, Qi Lei | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents | 通过扩散潜伏中的 MAP 估计比较无参考图像质量模型 | Weixia Zhang, Dingquan Li, Guangtao Zhai, Xiaokang Yang, Kede Ma | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Pre-Trained Model Recommendation for Downstream Fine-tuning | 用于下游微调的预训练模型推荐 | Jiameng Bai, Sai Wu, Jie Song, Junbo Zhao, Gang Chen | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Video Generation with Consistency Tuning | 具有一致性调整的视频生成 | Chaoyi Wang, Yaozhe Song, Yafeng Zhang, Jun Pei, Lijie Xia, Jianpo Liu | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Exploring Hardware Friendly Bottleneck Architecture in CNN for Embedded Computing Systems | 探索嵌入式计算系统 CNN 中的硬件友好瓶颈架构 | Xing Lei, Longjun Liu, Zhiheng Zhou, Hongbin Sun, Nanning Zheng | arxiv.org/pdf/2403.06… | null |
| 2024-03-11 | Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos | 设身处地为你着想:从外中心视频中提升自我中心视角 | Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman | arxiv.org/pdf/2403.06… | null |