[UPDATED!] 2024-03-28 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling | GaussianCube:使用 3D 生成建模的最佳传输构建高斯泼溅 | Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond | 检测 RGB 及以上文本到图像扩散模型的图像属性 | Katherine Xu, Lingzhi Zhang, Jianbo Shi | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction | InterDreamer:零镜头文本到 3D 动态人机交互 | Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models | GANTASTIC:基于 GAN 的可解释方向传输,用于文本到图像扩散模型中的解缠结图像编辑 | Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models | 深度生成模型潜在空间中艺术的协作交互进化 | Ole Hall, Anil Yaman | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model | 通过扩散模型的类间图像混合增强图像分类 | Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Frame by Familiar Frame: Understanding Replication in Video Diffusion Models | 逐帧熟悉:了解视频扩散模型中的复制 | Aimon Rahman, Malsha V. Perera, Vishal M. Patel | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold | XScale-NVS:具有哈希特征流形的跨尺度新颖视图合成 | Guangyu Wang, Jinzhi Zhang, Fan Wang, Ruqi Huang, Lu Fang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Debiasing Cardiac Imaging with Controlled Latent Diffusion Models | 利用受控潜伏扩散模型消除心脏成像偏差 | Grzegorz Skorupko, Richard Osuala, Zuzanna Szafranowska, Kaisar Kushibar, Nay Aung, Steffen E Petersen, Karim Lekadir, Polyxeni Gkontra | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication | 超越言语——生成用于沟通的整体 3D 人体二元运动 | Mingze Sun, Chao Xu, Xinyu Jiang, Yang Liu, Baigui Sun, Ruqi Huang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control | subjectDrive:通过主题控制扩展自动驾驶中的生成数据 | Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, et.al. | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality | 具有扩散模型的突发超分辨率可提高感知质量 | Kyotaro Tokoro, Kazutoshi Akita, Norimichi Ukita | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion | OAKINK2:完成复杂任务的双手物体操作数据集 | Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Imperceptible Protection against Style Imitation from Diffusion Models | 潜移默化地防止扩散模型的风格模仿 | Namhyuk Ahn, Wonhyuk Ahn, KiYoon Yoo, Daesik Kim, Seung-Hun Nam | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | RecDiffusion: Rectangling for Image Stitching with Diffusion Models | RecDiffusion:使用扩散模型进行图像拼接的矩形 | Tianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng Liu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation | MoDiTalker:用于生成高保真头部说话的运动解缠扩散模型 | Seyeon Kim, Siyoon Jin, Jihye Park, Kihong Kim, Jiyoung Kim, Jisu Nam, Seungryong Kim | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | QNCD: Quantization Noise Correction for Diffusion Models | QNCD:扩散模型的量化噪声校正 | Huanpeng Chu, Wei Wu, Chengjie Zang, Kun Yuan | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain Radiographs | 使用生成对抗网络生成普通放射线照片的合成医学成像 | John R. McNulty, Lee Kho, Alexandria L. Case, Charlie Fornaca, Drew Johnston, David Slater, Joshua M. Abzug, Sybil A. Russell | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation | 用于个性化文本到图像生成的自动化黑盒提示工程 | Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter | arxiv.org/pdf/2403.19… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | MagicLens:具有开放式指令的自监督图像检索 | Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation | Img2Loc:使用多模态基础模型和基于图像的检索增强生成重新审视图像地理定位 | Zhongliang Zhou, Jielu Zhang, Zihan Guan, Mengxuan Hu, Ni Lao, Lan Mu, Sheng Li, Gengchen Mai | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation | OV-Uni3DETR:通过循环模态传播实现统一开放词汇 3D 对象检测 | Zhenyu Wang, Yali Li, Taichi Liu, Hengshuang Zhao, Shengjin Wang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance | 定位、分配、优化:使用文本主题指导驯服定制图像修复 | Yulin Pan, Chaojie Mao, Zeyinzi Jiang, Zhen Han, Jingfeng Zhang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method | RELI11D:综合多模态人体运动数据集和方法 | Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, et.al. | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models | 多模态大语言模型推理的即插即用基础 | Jiaxing Chen, Yuxuan Liu, Dehu Li, Xiang An, Ziyong Feng, Yongle Zhao, Yin Xie | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality | 迈向对缺失模态具有鲁棒性的多模态视频段落字幕模型 | Sishuo Chen, Lei Li, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu Sun, Lu Hou | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Single-Shared Network with Prior-Inspired Loss for Parameter-Efficient Multi-Modal Imaging Skin Lesion Classification | 具有先验启发损失的单共享网络,用于参数高效的多模态成像皮肤病变分类 | Peng Tang, Tobias Lasser | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models | MMCert:针对多模态模型的对抗性攻击的可证明防御 | Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia | arxiv.org/pdf/2403.19… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects | SAID-NeRF:用于透明对象深度补全的分割辅助 NeRF | Avinash Ummadisingu, Jongkeum Choi, Koki Yamane, Shimpei Masuda, Naoki Fukaya, Kuniyuki Takahashi | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians | CoherentGS:使用 Coherent 3D 高斯进行稀疏新颖视图合成 | Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation | Mesh2NeRF:用于神经辐射场表示和生成的直接网格监督 | Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Sine Activated Low-Rank Matrices for Parameter Efficient Learning | 用于参数高效学习的正弦激活低秩矩阵 | Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey | arxiv.org/pdf/2403.19… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond | GauStudio:用于 3D 高斯泼溅及其他功能的模块化框架 | Chongjie Ye, Yinyu Nie, Jiahao Chang, Yuantao Chen, Yihao Zhi, Xiaoguang Han | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing | SA-GS:用于免训练抗锯齿的尺度自适应高斯泼溅 | Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering | TOGS:具有时间不透明度偏移的高斯泼溅,用于实时 4D DSA 渲染 | Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu | arxiv.org/pdf/2403.19… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts | 用于处理分布变化的去混杂无数据知识蒸馏 | Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment | 通过可学习代理指导和对齐联合训练和修剪 CNN | Alireza Ganjdanesh, Shangqian Gao, Heng Huang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search | AZ-NAS:组装零成本代理进行网络架构搜索 | Junghyup Lee, Bumsub Ham | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence | 在动态环境中:具有姿势序列的惯性感知 3D 人体建模 | Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Uncertainty-Aware Deep Video Compression with Ensembles | 使用集成的不确定性感知深度视频压缩 | Wufei Ma, Jiahao Li, Bin Li, Yan Lu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation | CRKD:通过跨模态知识蒸馏增强相机雷达目标检测 | Lingjun Zhao, Jingyu Song, Katherine A. Skinner | arxiv.org/pdf/2403.19… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | RSMamba: Remote Sensing Image Classification with State Space Model | RSMamba:利用状态空间模型进行遥感图像分类 | Keyan Chen, Bowen Chen, Chenyang Liu, Wenyuan Li, Zhengxia Zou, Zhenwei Shi | arxiv.org/pdf/2403.19… | link |
| 2024-03-28 | Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning | Change-Agent:从变更检测和变更字幕走向交互式综合变更解释和分析 | Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Siamese Vision Transformers are Scalable Audio-visual Learners | 连体视觉变压器是可扩展的视听学习器 | Yan-Bo Lin, Gedas Bertasius | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D | ILPO-NET:用于 3D 中任意体积图案的不变识别的网络 | Dmitrii Zhemchuzhnikov, Sergei Grudinin | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | DenseNets 重装上阵:超越 ResNets 和 ViT 的范式转变 | Donghyun Kim, Byeongho Heo, Dongyoon Han | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation | 糟糕的批次:通过代表性批次管理增强图像分类中的自我监督学习 | Ozgu Goksu, Nicolas Pugeault | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Cross-Attention is Not Always Needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion Recognition | 并不总是需要交叉注意力:视听维度情感识别的动态交叉注意力 | R. Gnana Praveen, Jahangir Alam | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | 用于类别级 6D 物体姿态估计的实例自适应和几何感知关键点学习 | Xiao Lin, Wenfei Yang, Yuan Gao, Tianzhu Zhang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Surface-based parcellation and vertex-wise analysis of ultra high-resolution ex vivo 7 tesla MRI in neurodegenerative diseases | 神经退行性疾病中超高分辨率离体 7 特斯拉 MRI 的基于表面的分割和顶点分析 | Pulkit Khandelwal, Michael Tran Duong, Constanza Fuentes, Amanda Denning, Winifred Trotman, Ranjit Ittyerah, Alejandra Bahena, Theresa Schuck, Marianna Gabrielyan, Karthik Prabhakaran, et.al. | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Segmentation tool for images of cracks | 裂缝图像分割工具 | Andrii Kompanets, Remco Duits, Davide Leonetti, Nicky van den Berg, H. H., Snijder | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Transparent and Clinically Interpretable AI for Lung Cancer Detection in Chest X-Rays | 透明且临床可解释的人工智能用于胸部 X 射线肺癌检测 | Amy Rafferty, Rishi Ramaesh, Ajitha Rajan | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge | 用于缺血性中风病变分割的稳健集成算法:超越 ISLES 挑战的普遍性和临床实用性 | Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, et.al. | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Towards Temporally Consistent Referring Video Object Segmentation | 实现时间一致的参考视频对象分割 | Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Infrared Small Target Detection with Scale and Location Sensitivity | 具有规模和位置敏感性的红外小目标检测 | Qiankun Liu, Rui Liu, Bolun Zheng, Hongkui Wang, Ying Fu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Test-Time Domain Generalization for Face Anti-Spoofing | 人脸反欺骗的测试时域泛化 | Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Hypergraph-based Multi-View Action Recognition using Event Cameras | 使用事件摄像机的基于超图的多视图动作识别 | Yue Gao, Jiaxuan Lu, Siqi Li, Yipeng Li, Shaoyi Du | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction | Total-Decom:以最少的交互进行分解的 3D 场景重建 | Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-tian Sun, Xiaojuang Qi | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points | 稀疏生成:使伪标签稀疏以实现带点的弱监督 | Tian Ma, Chuyang Shang, Wanzhu Ren, Yuancheng Li, Jiiayi Yang, Jiali Qian | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection | CAT:利用类间动力学进行域自适应对象检测 | Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment | 通过动作转换感知边界对齐实现高效且有效的弱监督动作分割 | Angchi Xu, Wei-Shi Zheng | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting | 通过不一致引导的细节正则化学习多重表示以实现掩模引导的抠图 | Weihao Jiang, Zhaozhi Xie, Yuxiang Lu, Longjie Qi, Jingyong Cai, Hiroyuki Uchiyama, Bin Chen, Yue Ding, Hongtao Lu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Rethinking Information Loss in Medical Image Segmentation with Various-sized Targets | 重新思考不同大小目标的医学图像分割中的信息丢失 | Tianyi Liu, Zhaorui Tan, Kaizhu Huang, Haochuan Jiang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration | 观看的算法方式:使用对象检测促进艺术探索 | Louie Søs Meyer, Johanne Engel Aaen, Anitamalina Regitse Tranberg, Peter Kun, Matthias Freiberger, Sebastian Risi, Anders Sundnes Løvlie | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models | CLAP4CLIP:视觉语言模型的概率微调的持续学习 | Saurav Jha, Dong Gong, Lina Yao | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition | OmniParser:文本识别、关键信息提取和表格识别的统一框架 | Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | PoCo: A Self-Supervised Approach via Polar Transformation Based Progressive Contrastive Learning for Ophthalmic Disease Diagnosis | PoCo:基于极地变换的渐进对比学习的自监督方法用于眼科疾病诊断 | Jinhong Wang, Tingting Chen, Jintai Chen, Yixuan Wu, Yuyang Xu, Danny Chen, Haochao Ying, Jian Wu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection | 用于视频异常检测的补丁时空关系预测 | Hao Shen, Lu Shi, Wanru Xu, Yigang Cen, Linna Zhang, Gaoyun An | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement | 具有图像增强功能的域自适应水下目标检测的实时框架 | Junjie Wen, Jinqiang Cui, Benyun Zhao, Bingxin Han, Xuchen Liu, Zhi Gao, Ben M. Chen | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach | 低阶重缩放视觉变压器微调:残差设计方法 | Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang | arxiv.org/pdf/2403.19… | null |
GNN
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks | SG-PGM:具有语义几何融合的部分图匹配网络,用于 3D 场景图对齐及其下游任务 | Yaxu Xie, Alain Pagani, Didier Stricker | arxiv.org/pdf/2403.19… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | Model Stock: All we need is just a few fine-tuned models | 模型库存:我们需要的只是一些经过微调的模型 | Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization | 用于事件相机 6 自由度姿势重定位的简单有效的基于点的网络 | Hongwei Ren, Jiadong Zhu, Yue Zhou, Haotian FU, Yulong Huang, Bojun Cheng | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth Estimation | FlowDepth:用于自监督单目深度估计的解耦光流 | Yiyang Sun, Zhiyuan Xu, Xiaonian Wang, Jing Yao | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | AAPMT: AGI Assessment Through Prompt and Metric Transformer | AAPMT:通过 Prompt 和 Metric Transformer 进行 AGI 评估 | Benhao Huang | arxiv.org/pdf/2403.19… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | BAMM: Bidirectional Autoregressive Motion Model | BAMM:双向自回归运动模型 | Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | PointCloud-Text Matching: Benchmark Datasets and a Baseline | 点云-文本匹配:基准数据集和基线 | Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | MedBN: Robust Test-Time Adaptation against Malicious Test Samples | MedBN:针对恶意测试样本的稳健测试时间适应 | Hyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, Jungseul Ok | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | RTracker: Recoverable Tracking via PN Tree Structured Memory | RTracker:通过 PN 树结构内存进行可恢复跟踪 | Yuqing Huang, Xin Li, Zikun Zhou, Yaowei Wang, Zhenyu He, Ming-Hsuan Yang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving | GraphAD:端到端自动驾驶的交互场景图 | Yunpeng Zhang, Deheng Qian, Ding Li, Yifeng Pan, Yong Chen, Zhenbao Liang, Zhiyao Zhang, Shurui Zhang, Hongxu Li, Maolei Fu, et.al. | arxiv.org/pdf/2403.19… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | GraspXL: Generating Grasping Motions for Diverse Objects at Scale | GraspXL:大规模生成各种物体的抓取动作 | Hui Zhang, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Situation Awareness for Driver-Centric Driving Style Adaptation | 以驾驶员为中心的驾驶风格适应的态势感知 | Johann Haselberger, Bonifaz Stuhr, Bernhard Schick, Steffen Müller | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes | TOD3Cap:迈向户外场景中的 3D 密集字幕 | Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, et.al. | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM | GlORIE-SLAM:全局优化的仅 RGB 隐式编码点云 SLAM | Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM | 实时 RGB-D SLAM 中隐式神经表示和几何渲染的基准测试 | Tongyan Hua, Lin Wang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Brain-Shift: Unsupervised Pseudo-Healthy Brain Synthesis for Novel Biomarker Extraction in Chronic Subdural Hematoma | 脑转移:无监督的伪健康脑合成用于慢性硬膜下血肿的新型生物标志物提取 | Baris Imre, Elina Thibeau-Sutre, Jorieke Reimer, Kuan Kho, Jelmer M. Wolterink | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | NIGHT -- Non-Line-of-Sight Imaging from Indirect Time of Flight Data | 夜间——来自间接飞行时间数据的非视距成像 | Matteo Caligiuri, Adriano Simonetto, Gianluca Agresti, Pietro Zanuttigh | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips | 用于单眼腹腔镜视频剪辑中解剖和手术器械 3D 跟踪的神经场 | Beerend G. A. Gerats, Jelmer M. Wolterink, Seb P. Mol, Ivo A. M. J. Broeders | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds | GeoAuxNet:迈向多传感器点云的通用 3D 表示学习 | Shengjun Zhang, Xin Fei, Yueqi Duan | arxiv.org/pdf/2403.19… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation | IVLMap:消费机器人导航的实例感知视觉语言基础 | Jiacui Huang, Hongtao Zhang, Mingbo Zhao, Zhou Wu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Text Data-Centric Image Captioning with Interactive Prompts | 带有交互式提示的以文本数据为中心的图像说明 | Yiyu Wang, Hao Luo, Jungang Xu, Yingfei Sun, Fan Wang | arxiv.org/pdf/2403.19… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-28 | RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents | RH20T-P:面向可组合泛化代理的原始级机器人数据集 | Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, et.al. | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Nearest Neighbor Classication for Classical Image Upsampling | 经典图像上采样的最近邻分类 | Evan Matthews, Nicolas Prate | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Semantic Map-based Generation of Navigation Instructions | 基于语义图的导航指令生成 | Chengzu Li, Chao Zhang, Simone Teufel, Rama Sanand Doddipatla, Svetlana Stoyanchev | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | LocCa: Visual Pretraining with Location-aware Captioners | LocCa:使用位置感知字幕进行视觉预训练 | Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network | CDIMC-net:认知深度不完全多视图聚类网络 | Jie Wen, Zheng Zhang, Yong Xu, Bob Zhang, Lunke Fei, Guo-Sen Xie | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization | Break-for-Make:用于可组合内容样式定制的模块化低阶改编 | Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Oliver Deussen, Weiming Dong, Jintao Li, Tong-Yee Lee | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Taming Lookup Tables for Efficient Image Retouching | 驯服查找表以实现高效图像修饰 | Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation | DreamSalon:在可编辑面部生成中保留身份上下文的分阶段扩散框架 | Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong Liu, Jingdong Wang | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | From Activation to Initialization: Scaling Insights for Optimizing Neural Fields | 从激活到初始化:扩展洞察以优化神经场 | Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | D'OH: Decoder-Only random Hypernetworks for Implicit Neural Representations | D'OH:用于隐式神经表示的仅解码器随机超网络 | Cameron Gordon, Lachlan Ewen MacDonald, Hemanth Saratchandran, Simon Lucey | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Towards Understanding Dual BN In Hybrid Adversarial Training | 理解混合对抗训练中的双重 BN | Chenshuang Zhang, Chaoning Zhang, Kang Zhang, Axi Niu, Junmo Kim, In So Kweon | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck | MVEB:具有多视图熵瓶颈的自监督学习 | Liangjian Wen, Xiasi Wang, Jianzhuang Liu, Zenglin Xu | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Tiny Machine Learning: Progress and Futures | 微型机器学习:进步与未来 | Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Song Han | arxiv.org/pdf/2403.19… | null |
| 2024-03-28 | Generative Quanta Color Imaging | 生成量子彩色成像 | Vishal Purohit, Junjie Luo, Yiheng Chi, Qi Guo, Stanley H. Chan, Qiang Qiu | arxiv.org/pdf/2403.19… | null |