[UPDATED!] 2024-03-20 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Editing Massive Concepts in Text-to-Image Diffusion Models | 编辑文本到图像扩散模型中的大量概念 | Tianwei Xiong, Yue Wu, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | ZigMa: Zigzag Mamba Diffusion Model | ZigMa:之字形曼巴扩散模型 | Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Bjorn Ommer | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | TimeRewind: Rewinding Time with Image-and-Events Video Diffusion | TimeRewind:通过图像和事件视频扩散来倒带时间 | Jingxi Chen, Brandon Y. Feng, Haoming Cai, Mingyang Xie, Christopher Metzler, Cornelia Fermuller, Yiannis Aloimonos | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DepthFM: Fast Monocular Depth Estimation with Flow Matching | DepthFM:利用流量匹配进行快速单目深度估计 | Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation | 成为你的外画师:通过特定于输入的适应掌握视频外画 | Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance | DanceCamera3D:音乐和舞蹈的 3D 摄像机运动合成 | Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection | 用于低成本心脏血流动力学不稳定性检测的多模态变分自动编码器 | Mohammod N. I. Suvon, Prasun C. Tripathi, Wenrui Fan, Shuo Zhou, Xianyuan Liu, Samer Alabed, Venet Osmani, Andrew J. Swift, Chen Chen, Haiping Lu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | ZoDi: Zero-Shot Domain Adaptation with Diffusion-Based Image Transfer | ZoDi:基于扩散的图像传输的零射击域适应 | Hiroki Azuma, Yusuke Matsui, Atsuto Maki | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | ReGround: Improving Textual and Spatial Grounding at No Cost | ReGround:免费改善文本和空间基础 | Yuseung Lee, Minhyuk Sung | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing | Ground-A-Score:扩大多属性编辑的乐谱蒸馏 | Hangeol Chang, Jinho Chang, Jong Chul Ye | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Diversity-aware Channel Pruning for StyleGAN Compression | StyleGAN 压缩的多样性感知通道修剪 | Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim, Jae-Pil Heo | arxiv.org/pdf/2403.13… | link |
| 2024-03-20 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image | Compress3D:用于从单个图像生成 3D 的压缩潜在空间 | Bowen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis | VSTAR:用于更长动态视频合成的生成时间护理 | Yumeng Li, William Beluch, Margret Keuper, Dan Zhang, Anna Khoreva | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion | 将扩散模型扩展到真实世界的 3D LiDAR 场景完成 | Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch, Jens Behley, Cyrill Stachniss | arxiv.org/pdf/2403.13… | link |
| 2024-03-20 | Cell Tracking in C. elegans with Cell Position Heatmap-Based Alignment and Pairwise Detection | 使用基于细胞位置热图的对齐和成对检测进行线虫细胞跟踪 | Kaito Shiku, Hiromitsu Shirai, Takeshi Ishihara, Ryoma Bise | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | S2DM: Sector-Shaped Diffusion Models for Video Generation | S2DM:用于视频生成的扇形扩散模型 | Haoran Lang, Yuxuan Ge, Zheng Tian | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis | IIDM:用于语义图像合成的图像到图像扩散模型 | Feng Liu, Xiaobin-Chang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Correlation Clustering of Organoid Images | 类器官图像的相关聚类 | Jannik Presberger, Rashmiparvathi Keshara, David Stein, Yung Hae Kim, Anne Grapin-Botton, Bjoern Andres | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation | AGFSync:利用人工智能生成的反馈来优化文本到图像生成中的偏好 | Jingkun An, Yinghao Zhu, Zongjian Li, Haoran Feng, Bohua Chen, Yemin Shi, Chengwei Pan | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment | LaserHuman:自由环境中语言引导的场景感知人体运动生成 | Peishan Cong, Ziyi WangZhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception | DetDiffusion:协同生成和感知模型以增强数据生成和感知 | Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, et.al. | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Building Optimal Neural Architectures using Interpretable Knowledge | 使用可解释的知识构建最佳神经架构 | Keith G. Mills, Fred X. Han, Mohammad Salameh, Shengyao Lu, Chunhua Zhou, Jiao He, Fengyu Sun, Di Niu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation | 超越骨骼:用于连贯 4D 序列生成的集成潜在映射 | Qitong Yang, Mingtao Feng, Zijie Wu, Shijie Sun, Weisheng Dong, Yaonan Wang, Ajmal Mian | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Nellie: Automated organelle segmentation, tracking, and hierarchical feature extraction in 2D/3D live-cell microscopy | Nellie:2D/3D 活细胞显微镜中的自动细胞器分割、跟踪和分层特征提取 | Austin E. Y. T. Lefebvre, Gabriel Sturm, Ting-Yu Lin, Emily Stoops, Magdalena Preciado Lopez, Benjamin Kaufmann-Malaga, Kayley Hake | arxiv.org/pdf/2403.13… | link |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | RAR:检索和排序用于视觉识别的增强 MLLM | Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models | 描述和剖析:用语言模型解释视觉网络中的神经元 | Nicholas Bai, Rahul A. Iyer, Tuomas Oikarinen, Tsui-Wei Weng | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts | AUD-TGN:在野外视听环境中使用时间卷积和 GPT-2 推进动作单元检测 | Jun Yu, Zerui Zhang, Zhihong Wei, Gongpeng Zhao, Zhongpeng Cai, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Recursive Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition | 维度情感识别中多模态融合的递归跨模态注意 | R. Gnana Praveen, Jahangir Alam | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | VL-Mamba:探索多模态学习的状态空间模型 | Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models | 如果......会怎样?:减轻大型多模态模型中幻觉效应的反事实起始 | Junho Kim, Yeon Ju Kim, Yong Man Ro | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | FMM-Attack: A Flow-based Multi-modal Adversarial Attack on Video-based LLMs | FMM-Attack:对基于视频的 LLM 的基于流的多模态对抗攻击 | Jinmin Li, Kuofeng Gao, Yang Bai, Jingyun Zhang, Shu-tao Xia, Yisen Wang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels | 用于带噪声标签的跨模式检索的统一最优传输框架 | Haochen Han, Minnan Luo, Huan Liu, Fang Nan | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models | HyperLLaVA:针对多模态大型语言模型的动态视觉和语言专家调整 | Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, et.al. | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Unifying Local and Global Multimodal Features for Place Recognition in Aliased and Low-Texture Environments | 统一局部和全局多模态特征,以在别名和低纹理环境中进行地点识别 | Alberto García-Hernández, Riccardo Giubilato, Klaus H. Strobl, Javier Civera, Rudolph Triebel | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling | HyperFusion:用于预测建模的表格和医学成像数据多模态集成的超网络方法 | Daniel Duenias, Brennan Nichyporuk, Tal Arbel, Tammy Riklin Raviv | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns | PuzzleVQA:用抽象视觉模式诊断语言模型的多模态推理挑战 | Yew Ken Chia, Vernon Toh Yan Han, Deepanway Ghosal, Lidong Bing, Soujanya Poria | arxiv.org/pdf/2403.13… | link |
| 2024-03-20 | Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations | 具有空间和时间一致性正则化的自监督类无关运动预测 | Kewei Wang, Yizheng Wu, Jun Cen, Zhiyu Pan, Xingyi Li, Zhe Wang, Zhiguo Cao, Guosheng Lin | arxiv.org/pdf/2403.13… | link |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS | RadSplat:基于辐射场的高斯喷射,可实现 900+ FPS 的鲁棒实时渲染 | Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion | 移动中的高斯泼溅:自然相机运动的模糊和滚动快门补偿 | Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin | arxiv.org/pdf/2403.13… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning | REAL:用于无范例类增量学习的表示增强分析学习 | Run He, Huiping Zhuang, Di Fang, Yizhu Chen, Kai Tong, Cen Chen | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Scale Decoupled Distillation | 规模解耦蒸馏 | Shicai Wei Chunbo Luo Yang Luo | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Progressive trajectory matching for medical dataset distillation | 用于医疗数据集蒸馏的渐进轨迹匹配 | Zhen Yu, Yang Liu, Qingchao Chen | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Diversified and Personalized Multi-rater Medical Image Segmentation | 多样化、个性化的多评估者医学图像分割 | Yicheng Wu, Xiangde Luo, Zhe Xu, Xiaoqing Guo, Lie Ju, Zongyuan Ge, Wenjun Liao, Jianfei Cai | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning | OrthCaps:具有稀疏注意力路由和剪枝的正交 CapsNet | Xinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation | DD-RobustBench:数据集蒸馏的对抗性鲁棒性基准 | Yifan Wu, Jiawei Du, Ping Liu, Yuewei Lin, Wenqing Cheng, Wei Xu | arxiv.org/pdf/2403.13… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments | 边界框针对特征丢失的稳定性反映了跨环境的检测器泛化 | Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Hierarchical NeuroSymbolic Approach for Action Quality Assessment | 行动质量评估的分层神经符号方法 | Lauren Okamoto, Paritosh Parmar | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Bridge the Modality and Capacity Gaps in Vision-Language Model Selection | 弥合视觉语言模型选择中的模态和能力差距 | Chao Yi, De-Chuan Zhan, Han-Jia Ye | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Practical End-to-End Optical Music Recognition for Pianoform Music | 实用的钢琴音乐端到端光学音乐识别 | Jiří Mayer, Milan Straka, Jan Hajič jr., Pavel Pecina | arxiv.org/pdf/2403.13… | link |
| 2024-03-20 | When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather | 当汽车遇到无人机:恶劣天气下无源域适应的双曲联合学习 | Giulia Rizzoli, Matteo Caligiuri, Donald Shenaj, Francesco Barbato, Pietro Zanuttigh | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition | HierCode:用于零样本中文文本识别的轻量级分层密码本 | Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model | 通过视觉语言模型中的知识增强增强神经退行性疾病的步态视频分析 | Diwei Wang, Kun Yuan, Candice Muller, Frédéric Blanc, Nicolas Padoy, Hyewon Seo | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization | Fostc3net:基于网络结构优化的轻量级YOLOv5 | Danqing Ma, Shaojie Li, Bo Dang, Hengyi Zang, Xinqi Dong | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Insight Into the Collocation of Multi-Source Satellite Imagery for Multi-Scale Vessel Detection | 深入探讨多源卫星图像搭配用于多尺度船舶检测 | Tran-Vu La, Minh-Tan Pham, Marco Chini | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | MotorEase: Automated Detection of Motor Impairment Accessibility Issues in Mobile App UIs | MotorEase:自动检测移动应用程序 UI 中的运动障碍辅助功能问题 | Arun Krishnavajjala, SM Hasan Mansur, Justin Jose, Kevin Moran | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Step-Calibrated Diffusion for Biomedical Optical Image Restoration | 用于生物医学光学图像恢复的步进校准扩散 | Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | ProMamba: Prompt-Mamba for polyp segmentation | ProMamba:用于息肉分割的 Prompt-Mamba | Jianhao Xie, Ruofan Liao, Ziang Zhang, Sida Yi, Yuesheng Zhu, Guibo Luo | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation | H-vmunet:用于医学图像分割的高阶视觉 Mamba UNet | Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang | arxiv.org/pdf/2403.13… | link |
| 2024-03-20 | Leveraging feature communication in federated learning for remote sensing image classification | 利用联邦学习中的特征通信进行遥感图像分类 | Anh-Kiet Duong, Hoàng-Ân Lê, Minh-Tan Pham | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments | Find n' Propagate:城市环境中的开放词汇 3D 对象检测 | Djamahl Etchegaray, Zi Huang, Tatsuya Harada, Yadan Luo | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Next day fire prediction via semantic segmentation | 通过语义分割预测第二天火灾 | Konstantinos Alexis, Stella Girtsou, Alexis Apostolakis, Giorgos Giannopoulos, Charalampos Kontoes | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | High-confidence pseudo-labels for domain adaptation in COVID-19 detection | 用于 COVID-19 检测中域适应的高置信度伪标签 | Robert Turnbull, Simon Mutch | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Deepfake Detection without Deepfakes: Generalization via Synthetic Frequency Patterns Injection | 没有 Deepfakes 的 Deepfake 检测:通过合成频率模式注入进行泛化 | Davide Alessandro Coccomini, Roberto Caldelli, Claudio Gennaro, Giuseppe Fiameni, Giuseppe Amato, Fabrizio Falchi | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking | Fast-Poly:用于 3D 多对象跟踪的快速多面体框架 | Xiaoyu Li, Dedong Liu, Lijun Zhao, Yitao Wu, Xian Wu, Jinghan Gao | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Stochastic Geometry Models for Texture Synthesis of Machined Metallic Surfaces: Sandblasting and Milling | 用于机加工金属表面纹理合成的随机几何模型:喷砂和铣削 | Natascha Jeziorski, Claudia Redenbach | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | MTP:通过多任务预训练推进遥感基础模型 | Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, et.al. | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | DOR3D-Net:用于 3D 手势估计的密集序数回归网络 | Yamin Mao, Zhihua Liu, Weiming Li, SoonYong Cho, Qiang Wang, Xiaoshuai Hao | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Robust image segmentation model based on binary level set | 基于二值水平集的鲁棒图像分割模型 | Wenqi Zhao | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images | 遥感图像中具有令人难忘的对比学习的面向少镜头的目标检测 | Jiawei Zhou, Wuzhou Li, Yi Cao, Hongtao Cai, Xiang Li | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Counting Network for Learning from Majority Label | 用于从多数标签学习的计数网络 | Kaito Shiku, Shinnosuke Matsuo, Daiki Suehiro, Ryoma Bise | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection | 用于统一异常检测的分层高斯混合归一化流建模 | Xincheng Yao, Ruoqi Li, Zefeng Qian, Lu Wang, Chongyang Zhang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Efficient scene text image super-resolution with semantic guidance | 具有语义指导的高效场景文本图像超分辨率 | LeoWu TomyEnrique, Xiangcheng Du, Kangliang Liu, Han Yuan, Zhao Zhou, Cheng Jin | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model | 使用大型语言模型生成的对等类进行分布外检测 | K Huang, G Song, Hanwen Su, Jiyan Wang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Rotary Position Embedding for Vision Transformer | 视觉变压器的旋转位置嵌入 | Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts | SAMCT:分段任何允许无人工任务指示器提示的 CT | Xian Lin, Yangyang Xiang, Zhehao Wang, Kwang-Ting Cheng, Zengqiang Yan, Li Yu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Self-Attention Based Semantic Decomposition in Vector Symbolic Architectures | 向量符号架构中基于自注意力的语义分解 | Calvin Yeung, Prathyush Poduval, Mohsen Imani | arxiv.org/pdf/2403.13… | null |
GNN
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Adaptive Critical Subgraph Mining for Cognitive Impairment Conversion Prediction with T1-MRI-based Brain Network | 基于 T1-MRI 的脑网络进行认知障碍转换预测的自适应关键子图挖掘 | Yilin Leng, Wenju Cui, Bai Chen, Xi Jiang, Shuangqing Chen, Jian Zheng | arxiv.org/pdf/2403.13… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Learning Novel View Synthesis from Heterogeneous Low-light Captures | 从异构低光捕获中学习新颖的视图合成 | Quan Zheng, Hao Sun, Huiyao Xu, Fanjiang Xu | arxiv.org/pdf/2403.13… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Improved Baselines for Data-efficient Perceptual Augmentation of LLMs | 改进法学硕士数据高效感知增强的基线 | Théophane Vallaeys, Mustafa Shukor, Matthieu Cord, Jakob Verbeek | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | ManiPose:机器人中姿势感知对象操纵的综合基准 | Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu | arxiv.org/pdf/2403.13… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Learning from Models and Data for Visual Grounding | 从模型和数据中学习视觉基础 | Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers | 视网膜视觉变压器 (RetinaViT):将缩放补丁引入视觉变压器 | Yuyang Shu, Michael E. Bain | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image | T-Pixel2Mesh:结合全局和局部 Transformer 从单个图像生成 3D 网格 | Shijie Zhang, Boyan Jiang, Keke He, Junwei Zhu, Ying Tai, Chengjie Wang, Yinda Zhang, Yanwei Fu | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer | Portrait4D-v2:伪多视图数据创建更好的4D头部合成器 | Yu Deng, Duomin Wang, Baoyuan Wang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | What explains the success of cross-modal fine-tuning with ORCA? | 如何解释 ORCA 跨模式微调的成功? | Paloma García-de-Herreros, Vagrant Gautam, Philipp Slusallek, Dietrich Klakow, Marius Mosbach | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | vid-TLDR: Training Free Token merging for Light-weight Video Transformer | vid-TLDR:轻量级视频变压器的免费训练令牌合并 | Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving | AMP:通过自动驾驶的下一个令牌预测重新审视自回归运动预测 | Xiaosong Jia, Shaoshuai Shi, Zijun Chen, Li Jiang, Wenlong Liao, Tao He, Junchi Yan | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | Mora:通过多代理框架实现通用视频生成 | Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun | arxiv.org/pdf/2403.13… | link |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | Towards Principled Representation Learning from Videos for Reinforcement Learning | 从强化学习视频中进行有原则的表示学习 | Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | DVMNet:超越假设计算看不见的物体的相对姿态 | Chen Zhao, Tong Zhang, Zheng Dang, Mathieu Salzmann | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Motion Generation from Fine-grained Textual Descriptions | 根据细粒度文本描述生成运动 | Kunhang Li, Yansong Feng | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models | CLIPSwarm:使用视觉语言模型根据文本提示生成无人机表演 | Pablo Pueyo, Eduardo Montijano, Ana C. Murillo, Mac Schwager | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery | 推进增强现实中的 6D 姿态估计——克服不受控制的图像的投影模糊性 | Mayura Manawadu, Sieun Park, Soon-Yong Park | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Text-to-3D Shape Generation | 文本到 3D 形状生成 | Han-Hung Lee, Manolis Savva, Angel X. Chang | arxiv.org/pdf/2403.13… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | A Unified and General Framework for Continual Learning | 持续学习的统一通用框架 | Zhenyi Wang, Yan Li, Li Shen, Heng Huang | arxiv.org/pdf/2403.13… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-20 | On Pretraining Data Diversity for Self-Supervised Learning | 关于自监督学习的预训练数据多样性 | Hasan Abed Al Kader Hammoud, Tuhin Das, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Certified Human Trajectory Prediction | 经过认证的人体轨迹预测 | Mohammadhossein Bahari, Saeed Saadatnejad, Amirhossein Asgari Farsangi, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval | 利用高分辨率功能改进基于深度哈希的图像检索 | Aymene Berriche, Mehdi Adjal Zakaria, Riyadh Baghdadi | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping | DBA-Fusion:将深度密集视觉束调整与多个传感器紧密集成,以实现大规模定位和绘图 | Yuxuan Zhou, Xingxing Li, Shengyu Li, Xuanbin Wang, Shaoquan Feng, Yuxuan Tan | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning | SPTNet:具有空间提示调整的广义类别发现的有效替代框架 | Hongjun Wang, Sagar Vaze, Kai Han | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Learning User Embeddings from Human Gaze for Personalised Saliency Prediction | 从人类注视中学习用户嵌入以进行个性化显着性预测 | Florian Strohm, Mihai Bâce, Andreas Bulling | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | 用于类别无关姿势估计的元点学习和细化 | Junjie Chen, Jiebin Yan, Yuming Fang, Li Niu | arxiv.org/pdf/2403.13… | link |
| 2024-03-20 | IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models | IDAdapter:学习混合特征以实现文本到图像模型的免调整个性化 | Siying Cui, Jiankang Deng, Jia Guo, Xiang An, Yongle Zhao, Xinyu Wei, Ziyong Feng | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | An AI-Assisted Skincare Routine Recommendation System in XR | XR 中人工智能辅助的日常护肤推荐系统 | Gowravi Malalur Rajegowda, Yannis Spyridis, Barbara Villarini, Vasileios Argyriou | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | MedCycle: Unpaired Medical Report Generation via Cycle-Consistency | MedCycle:通过周期一致性生成不成对的医疗报告 | Elad Hirsch, Gefen Dawidowicz, Ayellet Tal | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation | TiBiX:利用时间信息进行双向 X 射线和报告生成 | Santosh Sanjeev, Fadillah Adamsyah Maani, Arsen Abzhanov, Vijay Ram Papineni, Ibrahim Almakky, Bartłomiej W. Papież, Mohammad Yaqub | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis | FissionFusion:用于医学图像分析的快速几何生成和分层汤 | Santosh Sanjeev, Nuren Zhaksylyk, Ibrahim Almakky, Anees Ur Rehman Hashmi, Mohammad Areeb Qazi, Mohammad Yaqub | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | AdaViPro: Region-based Adaptive Visual Prompt for Large-Scale Models Adapting | AdaViPro:用于大规模模型自适应的基于区域的自适应视觉提示 | Mengyu Yang, Ye Tian, Lanshan Zhang, Xiao Liang, Xuming Ran, Wendong Wang | arxiv.org/pdf/2403.13… | null |
| 2024-03-20 | SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models | SC-Tune:在大视觉语言模型中释放自洽的指涉理解 | Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu | arxiv.org/pdf/2403.13… | null |