[UPDATED!] 2024-03-15 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives | Lodge:由特征舞蹈原语引导的长舞蹈生成的从粗到细的扩散网络 | Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding | Isotropic3D:基于单个 CLIP 嵌入的图像到 3D 生成 | Pengkun Liu, Yikai Wang, Fuchun Sun, Jiafang Li, Hang Xiao, Hongxiang Xue, Xinzhou Wang | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Denoising Task Difficulty-based Curriculum for Training Diffusion Models | 基于去噪任务难度的扩散模型训练课程 | Jin-Young Kim, Hyojun Go, Soonwoo Kwon, Hyun-Gyoon Kim | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Towards Generalizable Deepfake Video Detection with Thumbnail Layout and Graph Reasoning | 通过缩略图布局和图形推理实现可推广的 Deepfake 视频检测 | Yuting Xu, Jian Liang, Lijun Sheng, Xiao-Yu Zhang | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder | 使用潜在扩散模型和隐式神经解码器的任意尺度图像生成和上采样 | Jinseok Kim, Tae-Kyun Kim | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model | FDGaussian:通过几何感知扩散模型从单幅图像进行快速高斯泼溅 | Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution | BlindDiff:增强盲图像超分辨率扩散模型中的退化建模 | Feng Li, Yixuan Wu, Zichao Liang, Runmin Cong, Huihui Bai, Yao Zhao, Meng Wang | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Animate Your Motion: Turning Still Images into Dynamic Videos | 动画化你的动作:将静态图像变成动态视频 | Mingxiao Li, Bo Wan, Marie-Francine Moens, Tinne Tuytelaars | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation | SemanticHuman-HD:高分辨率语义解缠结的 3D 人类生成 | Peng Zheng, Tao Liu, Zili Yi, Rui Ma | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration | DiffMAC:用于高泛化盲脸恢复的扩散流形幻觉校正 | Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | RangeLDM: Fast Realistic LiDAR Point Cloud Generation | RangeLDM:快速逼真的 LiDAR 点云生成 | Qianjiang Hu, Zhimin Zhang, Wei Hu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | A survey of synthetic data augmentation methods in computer vision | 计算机视觉中合成数据增强方法的综述 | Alhassan Mumuni, Fuseini Mumuni, Nana Kobina Gerrar | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | RID-TWIN: An end-to-end pipeline for automatic face de-identification in videos | RID-TWIN:用于视频中自动人脸去识别的端到端管道 | Anirban Mukherjee, Monjoy Narayan Choudhury, Dinesh Babu Jayagopi | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model | SphereDiffusion:球形几何感知失真弹性扩散模型 | Tao Wu, Xuewei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation | 通过量化域混合表示进行现实世界计算像差校正 | Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images | ST-LDM:真实图像中基于文本的对象生成的通用框架 | Xiangtian Xue, Jiasong Wu, Youyong Kong, Lotfi Senhadji, Huazhong Shu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting | 通过表面对齐高斯溅射实现可控文本到 3D 生成 | Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu | arxiv.org/pdf/2403.09… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | VideoAgent: Long-form Video Understanding with Large Language Model as Agent | VideoAgent:以大语言模型为代理的长格式视频理解 | Xiaohan Wang, Yuhui Zhang, Orr Zohar, Serena Yeung-Levy | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study | 多模态基础模型的零样本鲁棒性基准测试:试点研究 | Chenguang Wang, Ruoxi Jia, Xin Liu, Dawn Song | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning | 通过对抗性指令调整减轻大型多模态模型的对话幻觉 | Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Joint Multimodal Transformer for Dimensional Emotional Recognition in the Wild | 用于野外维度情感识别的联合多模态变压器 | Paul Waligora, Osama Zeeshan, Haseeb Aslam, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models | EXAMS-V:用于评估视觉语言模型的多学科多语言多模式考试基准 | Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image | ANIM:用于从单个 RGB-D 图像重建人体的精确神经隐式模型 | Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | Uni-SMART:通用科学多模态分析和研究转换器 | Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang, et.al. | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models | 使用视觉语言模型进行少镜头图像分类和分割作为视觉问答 | Tian Meng, Yang Tao, Ruilin Lyu, Wuliang Yin | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | Magic Tokens:选择多样化的Token进行多模态物体重识别 | Pingping Zhang, Yuhao Wang, Yang Liu, Zhengzheng Tu, Huchuan Lu | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | HawkEye: Training Video-Text LLMs for Grounding Text in Videos | HawkEye:培训视频文本法学硕士,为视频中的文本奠定基础 | Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Improving Medical Multi-modal Contrastive Learning with Expert Annotations | 通过专家注释改进医学多模态对比学习 | Yogesh Kumar, Pekka Marttinen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network | CSDNet:通过轻量级交叉浅层和深层感知网络检测深度热中的显着物体 | Xiaotong Yu, Ruihan Xie, Zhihe Zhao, Chang-Wen Chen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images | 从组织病理学全幻灯片图像中提炼癌症预后的组织基因组知识 | Zhikang Wang, Yumeng Zhang, Yingxue Xu, Seiya Imoto, Hao Chen, Jiangning Song | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Knowledge Condensation and Reasoning for Knowledge-based VQA | 基于知识的 VQA 的知识凝结和推理 | Dongze Hao, Jian Jia, Longteng Guo, Qunbo Wang, Te Yang, Yan Li, Yanhua Cheng, Bo Wang, Quan Chen, Han Li, et.al. | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception | SparseFusion:用于远距离 3D 感知的高效稀疏多模态融合框架 | Yiheng Li, Hongyang Li, Zehao Huang, Hong Chang, Naiyan Wang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation | 视觉基础模型促进 3D 语义分割的跨模式无监督域适应 | Jingyi Xu, Weidong Yang, Lingdong Kong, Youquan Liu, Rui Zhang, Qingyuan Zhou, Ben Fei | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery | GET:释放 CLIP 的多模式潜力以进行广义类别发现 | Enguang Wang, Zhimao Peng, Zhengyuan Xie, Xialei Liu, Ming-Ming Cheng | arxiv.org/pdf/2403.09… | link |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | FeatUp: A Model-Agnostic Framework for Features at Any Resolution | FeatUp:适用于任何分辨率特征的模型无关框架 | Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, William T. Freeman | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Thermal-NeRF: Neural Radiance Fields from an Infrared Camera | Thermal-NeRF:红外相机的神经辐射场 | Tianxiang Ye, Qi Wu, Junyuan Deng, Guoqing Liu, Liu Liu, Songpengcheng Xia, Liang Pang, Wenxian Yu, Ling Pei | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression | 利用神经辐射场进行关键点场景坐标回归的描述符合成 | Huy-Hoang Bui, Bach-Thuan Bui, Dinh-Tuan Tran, Joo-Ho Lee | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time | GGRt:在没有实时姿势先验的情况下实现可推广的 3D 高斯 | Hao Li, Yuanyuan Gao, Dingwen Zhang, Chenming Wu, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields | URS-NeRF:神经辐射场的无序滚动快门束调整 | Bo Xu, Ziao Liu, Mengqi Guo, Jiancheng Li, Gim Hee Li | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video | DyBluRF:模糊单目视频的动态神经辐射场 | Huiqiang Sun, Xingyi Li, Liao Shen, Xinyi Ye, Ke Xian, Zhiguo Cao | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience | Den-SOFT:用于六自由度沉浸式体验的密集空间导向光场数据集 | Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, et.al. | arxiv.org/pdf/2403.09… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians | SWAG:使用外观条件高斯函数在野外图像中泼溅 | Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing | 纹理-GS:解开几何和纹理以进行 3D 高斯泼溅编辑 | Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang | arxiv.org/pdf/2403.10… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | Group-Mix SAM: Lightweight Solution for Industrial Assembly Line Applications | Group-Mix SAM:工业装配线应用的轻量级解决方案 | Wu Liang, X. -G. Ma | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Towards Adversarially Robust Dataset Distillation by Curvature Regularization | 通过曲率正则化实现对抗性鲁棒数据集蒸馏 | Eric Xue, Yijiang Li, Haoyang Liu, Yifan Shen, Haohan Wang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers | 具有一步领先注意力的多标准令牌融合,用于高效视觉变压器 | Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Quantization Effects on Neural Networks Perception: How would quantization change the perceptual field of vision models? | 量化对神经网络感知的影响:量化将如何改变感知视野模型? | Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani, Alessandro Bruno | arxiv.org/pdf/2403.09… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | Frozen Feature Augmentation for Few-Shot Image Classification | 用于少样本图像分类的冻结特征增强 | Andreas Bär, Neil Houlsby, Mostafa Dehghani, Manoj Kumar | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices | NeuFlow:使用边缘设备对机器人进行实时、高精度光流估计 | Zhiyong Zhang, Huaizu Jiang, Hanumant Singh | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search | 通过混合卷积变压器架构进行实时图像分割 | Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | A comparative study on machine learning approaches for rock mass classification using drilling data | 利用钻孔数据进行岩体分类的机器学习方法的比较研究 | Tom F. Hansen, Georg H. Erharter, Zhongqiang Liu, Jim Torresen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Energy Correction Model in the Feature Space for Out-of-Distribution Detection | 分布外检测的特征空间能量校正模型 | Marc Lafon, Clément Rambour, Nicolas Thome | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Open Stamped Parts Dataset | 打开冲压件数据集 | Sara Antiles, Sachin S. Talathi | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras | SimPB:用于从多个摄像机进行 2D 和 3D 物体检测的单一模型 | Yingqi Tang, Zhaotie Meng, Guoliang Chen, Erkang Cheng | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Deep Learning for Multi-Level Detection and Localization of Myocardial Scars Based on Regional Strain Validated on Virtual Patients | 基于区域应变的心肌疤痕多级检测和定位的深度学习在虚拟患者上得到验证 | Müjde Akdeniz, Claudia Alessandra Manetti, Tijmen Koopsen, Hani Nozari Mirar, Sten Roar Snare, Svein Arne Aase, Joost Lumens, Jurica Šprem, Kristin Sarah McLeod | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Local positional graphs and attentive local features for a data and runtime-efficient hierarchical place recognition pipeline | 局部位置图和细心的局部特征,用于数据和运行时高效的分层位置识别管道 | Fangming Yuan, Stefan Schubert, Peter Protzel, Peer Neubert | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning | 区域感知分布对比:多任务部分监督学习的新方法 | Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning | CoLeCLIP:通过联合任务提示和词汇学习进行开放域持续学习 | Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, Peng Wang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation | 探索将光流纳入 nnU-Net 框架中以实现手术器械分割 | Marcos Fernández-Rodríguez, Bruno Silva, Sandro Queirós, Helena R. Torres, Bruno Oliveira, Pedro Morais, Lukas R. Buschle, Jorge Correia-Pinto, Estevão Lima, João L. Vilaça | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | A Data-Driven Approach for Mitigating Dark Current Noise and Bad Pixels in Complementary Metal Oxide Semiconductor Cameras for Space-based Telescopes | 一种用于减轻天基望远镜互补金属氧化物半导体相机中暗电流噪声和坏像素的数据驱动方法 | Peng Jia, Chao Lv, Yushan Li, Yongyang Sun, Shu Niu, Zhuoxiao Wang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Learning on JPEG-LDPC Compressed Images: Classifying with Syndromes | JPEG-LDPC 压缩图像的学习:用综合症分类 | Ahcen Aliouat, Elsa Dupraz | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Generative Region-Language Pretraining for Open-Ended Object Detection | 用于开放式目标检测的生成区域语言预训练 | Chuang Lin, Yi Jiang, Lizhen Qu, Zehuan Yuan, Jianfei Cai | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | A Hybrid SNN-ANN Network for Event-based Object Detection with Spatial and Temporal Attention | 用于具有空间和时间注意力的基于事件的目标检测的混合 SNN-ANN 网络 | Soikat Hasan Ahmed, Jan Finkbeiner, Emre Neftci | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Computer User Interface Understanding. A New Dataset and a Learning Framework | 计算机用户界面理解。新的数据集和学习框架 | Andrés Muñoz, Daniel Borrajo | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Cardiac valve event timing in echocardiography using deep learning and triplane recordings | 使用深度学习和三平面记录进行超声心动图心脏瓣膜事件计时 | Benjamin Strandli Fermann, John Nyberg, Espen W. Remme, Jahn Frederik Grue, Helén Grue, Roger Håland, Lasse Lovstakken, Håvard Dalen, Bjørnar Grenne, Svein Arne Aase, et.al. | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception | RCooper:用于路边协作感知的真实世界大规模数据集 | Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model | TransLandSeg:一种基于视觉基础模型的滑坡语义分割迁移学习方法 | Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning | 通过多个法学硕士协作推理增强以人为中心的动态场景理解 | Hang Zhang, Wenxiao Zhang, Haoxuan Qu, Jun Liu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks | 微调深度神经网络的自适应随机特征正则化 | Shin'ya Yamaguchi, Sekitoshi Kanai, Kazuki Adachi, Daiki Chijiwa | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Monkeypox disease recognition model based on improved SE-InceptionV3 | 基于改进SE-InceptionV3的猴痘疾病识别模型 | Junzhuo Chen, Zonghan Lu, Shitong Kang | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner | CrossGLG:LLM以跨级别方式指导基于骨架的一次性3D动作识别 | Tingbing Yan, Wenzheng Zeng, Yang Xiao, Xingyu Tong, Bo Tan, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Control and Automation for Industrial Production Storage Zone: Generation of Optimal Route Using Image Processing | 工业生产存储区的控制和自动化:利用图像处理生成最佳路线 | Bejamin A. Huerfano, Fernando Jimenez | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model | TextBlockV2:使用预训练语言模型实现精确的无检测场景文本识别 | Jiahao Lyu, Jin Wei, Gangyan Zeng, Zeng Li, Enze Xie, Wei Wang, Yu Zhou | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Rethinking Low-quality Optical Flow in Unsupervised Surgical Instrument Segmentation | 重新思考无监督手术器械分割中的低质量光流 | Peiran Wu, Yang Liu, Jiayu Huo, Gongyu Zhang, Christos Bergeles, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Lifelong Person Re-Identification with Backward-Compatibility | 具有向后兼容性的终身人员重新识别 | Minyoung Oh, Jae-Young Sim | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Linear optimal transport subspaces for point set classification | 用于点集分类的线性最优传输子空间 | Mohammad Shifat E Rabbi, Naqib Sad Pathan, Shiying Li, Yan Zhuang, Abu Hasnat Mohammad Rubaiyat, Gustavo K Rohde | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Cardiac Magnetic Resonance 2D+T Short- and Long-axis Segmentation via Spatio-temporal SAM Adaptation | 通过时空 SAM 适应进行心脏磁共振 2D+T 短轴和长轴分割 | Zhennong Chen, Sekeun Kim, Hui Ren, Quanzheng Li, Xiang Li | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | FBPT: A Fully Binary Point Transformer | FBPT:完全二进制点变压器 | Zhixing Hou, Yuzhang Shang, Yan Yan | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | Skeleton-Based Human Action Recognition with Noisy Labels | 带有噪声标签的基于骨骼的人体动作识别 | Yi Xu, Kunyu Peng, Di Wen, Ruiping Liu, Junwei Zheng, Yufan Chen, Jiaming Zhang, Alina Roitberg, Kailun Yang, Rainer Stiefelhagen | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | ViTCN: Vision Transformer Contrastive Network For Reasoning | ViTCN:用于推理的 Vision Transformer 对比网络 | Bo Song, Yuanhao Xu, Yichao Wu | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity Assessment | 焦点转移:从 Swin-Transformer 膝骨关节炎严重程度评估的全局语义到局部显着特征 | Aymen Sekhri, Marouane Tliba, Mohamed Amine Kerkouri, Yassine Nasser, Aladine Chetouani, Alessandro Bruno, Rachid Jennane | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | Attention-Enhanced Hybrid Feature Aggregation Network for 3D Brain Tumor Segmentation | 用于 3D 脑肿瘤分割的注意力增强混合特征聚合网络 | Ziya Ata Yazıcı, İlkay Öksüz, Hazım Kemal Ekenel | arxiv.org/pdf/2403.09… | link |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | Robust Shape Fitting for 3D Scene Abstraction | 用于 3D 场景抽象的稳健形状拟合 | Florian Kluger, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | NECA: Neural Customizable Human Avatar | NECA:神经可定制人体头像 | Junjin Xiao, Qing Zhang, Zhan Xu, Wei-Shi Zheng | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation | AUTONODE:用于认知 GUI 自动化的神经图形自学习引擎 | Arkajit Datta, Tushar Verma, Rajat Chawla | arxiv.org/pdf/2403.10… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | Using an LLM to Turn Sign Spottings into Spoken Language Sentences | 使用法学硕士将手势识别转化为口语句子 | Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden | arxiv.org/pdf/2403.10… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors | P-MapNet:由 SDMap 和 HDMap 先验增强的远视地图生成器 | Zhou Jiang, Zhenxin Zhu, Pengfei Li, Huan-ang Gao, Tianyuan Yuan, Yongliang Shi, Hang Zhao, Hao Zhao | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction | 多人时间注视跟踪和社交注视预测的新框架 | Anshul Gupta, Samy Tafasca, Arya Farkhondeh, Pierre Vuillecard, Jean-Marc Odobez | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Approximate Nullspace Augmented Finetuning for Robust Vision Transformers | 鲁棒视觉变压器的近似零空间增强微调 | Haoyang Liu, Aditya Singh, Yijiang Li, Haohan Wang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | PASTA: Towards Flexible and Efficient HDR Imaging Via Progressively Aggregated Spatio-Temporal Aligment | PASTA:通过逐步聚合的时空对齐实现灵活高效的 HDR 成像 | Xiaoning Liu, Ao Li, Zongwei Wu, Yapeng Du, Le Zhang, Yulun Zhang, Radu Timofte, Ce Zhu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | How Powerful Potential of Attention on Image Restoration? | 关注力对图像修复的潜力有多大? | Cong Wang, Jinshan Pan, Yeying Jin, Liyan Wang, Wei Wang, Gang Fu, Wenqi Ren, Xiaochun Cao | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization | 用于细粒度视觉分类的上下文语义质量感知网络 | Qin Xu, Sitong Li, Jiahui Wang, Bo Jiang, Jinhui Tang | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Depth-induced Saliency Comparison Network for Diagnosis of Alzheimer's Disease via Jointly Analysis of Visual Stimuli and Eye Movements | 通过联合分析视觉刺激和眼动来诊断阿尔茨海默病的深度诱导显着性比较网络 | Yu Liu, Wenlin Zhang, Shaochu Wang, Fangyu Zuo, Peiguang Jing, Yong Ji | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising | 用于高光谱图像去噪的混合卷积和注意力网络 | Shuai Hu, Feng Gao, Xiaowei Zhou, Junyu Dong, Qian Du | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings | MEDPNet:实现复杂压铸件的高精度自适应配准 | Yu Du, Yu Song, Ce Guo, Xiaojing Tian, Dong Liu, Ming Cong | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba | EfficientVMamba:用于轻量级 Visual Mamba 的 Atrous 选择性扫描 | Xiaohuan Pei, Tao Huang, Chang Xu | arxiv.org/pdf/2403.09… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | ParaPoint: Learning Global Free-Boundary Surface Parameterization of 3D Point Clouds | ParaPoint:学习 3D 点云的全局自由边界表面参数化 | Qijian Zhang, Junhui Hou, Ying He | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution | SCILLA:大型城市地区的表面隐式学习,体积混合解决方案 | Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation | KP-RED:利用语义关键点进行联合 3D 形状检索和变形 | Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering | VRHCF:通过体素表示和分层对应过滤进行跨源点云配准 | Guiyu Zhao, Zewen Du, Zhentao Guo, Hongbin Ma | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling | 用于矢量量化图像建模的带词性的码本传输 | Baoquan Zhang, Huaibin Wang, Luo Chuyao, Xutao Li, Liang Guotao, Yunming Ye, Xiaochen Qi, Yao He | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory | T4P:通过屏蔽自动编码器和参与者特定的令牌内存进行轨迹预测的测试时训练 | Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon, Jaewoo Jeong, Kuk-Jin Yoon | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | TRG-Net: An Interpretable and Controllable Rain Generator | TRG-Net:可解释且可控的降雨发生器 | Zhiqiang Pang, Hong Wang, Qi Xie, Deyu Meng, Zongben Xu | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | Boundary Constraint-free Biomechanical Model-Based Surface Matching for Intraoperative Liver Deformation Correction | 基于无边界约束生物力学模型的表面匹配术中肝脏变形矫正 | Zixin Yang, Richard Simon, Kelly Merrell, Cristian. A. Linte | arxiv.org/pdf/2403.09… | null |
| 2024-03-15 | RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training | RadCLIP:通过对比语言图像预训练增强放射图像分析 | Zhixiu Lu, Hailong Li, Lili He | arxiv.org/pdf/2403.09… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning | CDMAD:类不平衡半监督学习的类分布不匹配感知去偏 | Hyuck Lee, Heeyoung Kim | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance | E4C:通过利用高效的 CLIP 指导增强基于文本的图像编辑的可编辑性 | Tianrui Huang, Pu Cao, Lu Yang, Chun Liu, Mengjie Hu, Zhiwei Liu, Qing Song | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Learning Physical Dynamics for Object-centric Visual Prediction | 学习物理动力学以进行以对象为中心的视觉预测 | Huilin Xu, Tao Chen, Feng Xu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception | 是什么造就了良好的协作视图?多智能体感知的对比互信息最大化 | Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou | arxiv.org/pdf/2403.10… | link |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-15 | Strong and Controllable Blind Image Decomposition | 强而可控的图像盲分解 | Zeyu Zhang, Junlin Han, Chenhui Gou, Hongdong Li, Liang Zheng | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Understanding the Double Descent Phenomenon in Deep Learning | 理解深度学习中的双重下降现象 | Marc Lafon, Alexandre Thomas | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Evaluating Perceptual Distances by Fitting Binomial Distributions to Two-Alternative Forced Choice Data | 通过将二项式分布拟合到两个替代的强制选择数据来评估感知距离 | Alexander Hepburn, Raul Santos-Rodriguez, Javier Portilla | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Overcoming Distribution Shifts in Plug-and-Play Methods with Test-Time Training | 通过测试时训练克服即插即用方法中的分布变化 | Edward P. Chandler, Shirin Shoushtari, Jiaming Liu, M. Salman Asif, Ulugbek S. Kamilov | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Testing MediaPipe Holistic for Linguistic Analysis of Nonmanual Markers in Sign Languages | 测试 MediaPipe Holistic 对手语中非手动标记的语言分析 | Anna Kuznetsova, Vadim Kimmelman | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement | CPGA:用于增强压缩视频质量的编码先验引导聚合网络 | Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | End-to-end Adaptive Dynamic Subsampling and Reconstruction for Cardiac MRI | 心脏 MRI 的端到端自适应动态子采样和重建 | George Yiasemis, Jan-Jakob Sonke, Jonas Teuwen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | A Fixed-Point Approach to Unified Prompt-Based Counting | 统一基于提示的计数的定点方法 | Wei Lin, Antoni B. Chan | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Perceptual Quality-based Model Training under Annotator Label Uncertainty | 注释器标签不确定性下基于感知质量的模型训练 | Chen Zhou, Mohit Prabhushankar, Ghassan AlRegib | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | CoReEcho: Continuous Representation Learning for 2D+time Echocardiography Analysis | CoReEcho:用于 2D+时间超声心动图分析的连续表示学习 | Fadillah Adamsyah Maani, Numan Saeed, Aleksandr Matsun, Mohammad Yaqub | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | PQDynamicISP: Dynamically Controlled Image Signal Processor for Any Image Sensors Pursuing Perceptual Quality | PQDynamicISP:适用于任何追求感知质量的图像传感器的动态控制图像信号处理器 | Masakazu Yoshimura, Junji Otsuka, Takeshi Ohashi | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Approximation and bounding techniques for the Fisher-Rao distances | Fisher-Rao 距离的近似和边界技术 | Frank Nielsen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks | 使用阴影自适应攻击对图像阴影去除的对抗鲁棒性进行基准测试 | Chong Wang, Yi Yu, Lanqing Guo, Bihan Wen | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Revisiting Adversarial Training under Long-Tailed Distributions | 重新审视长尾分布下的对抗训练 | Xinli Yue, Ningping Mou, Qian Wang, Lingchen Zhao | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | Boundary Matters: A Bi-Level Active Finetuning Framework | 边界问题:双层主动微调框架 | Han Lu, Yichen Xie, Xiaokang Yang, Junchi Yan | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment | 用于无参考点云质量评估的多视图融合对比预训练 | Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, Shan Liu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI | 通过加速 MRI 的子采样分解进行渐进式分治 | Chong Wang, Lanqing Guo, Yufei Wang, Hao Cheng, Yi Yu, Bihan Wen | arxiv.org/pdf/2403.10… | link |
| 2024-03-15 | PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment | PAME:用于无参考点云质量评估的自监督屏蔽自动编码器 | Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Shan Liu | arxiv.org/pdf/2403.10… | null |
| 2024-03-15 | AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors | AD3:内隐行动是世界模型区分各种视觉干扰因素的关键 | Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan | arxiv.org/pdf/2403.09… | null |