[UPDATED!] 2024-03-25 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Multi-Scale Texture Loss for CT denoising with GANs | 使用 GAN 进行 CT 去噪的多尺度纹理损失 | Francesco Di Feola, Lorenzo Tronchin, Valerio Guarrasi, Paolo Soda | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions | SDXS:具有图像条件的实时一步潜扩散模型 | Yuda Song, Zehao Sun, Xuanwu Yin | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation | SatSynth:通过扩散模型增强图像掩模对以进行空中语义分割 | Aysim Toker, Marvin Eisenberger, Daniel Cremers, Laura Leal-Taixé | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models | 中间融合 ViT 可在扩散模型中实现高效的文本-图像对齐 | Zizhao Hu, Shaochong Jia, Mohammad Rostami | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Let Real Images be as a Judger, Spotting Fake Images Synthesized with Generative Models | 让真实图像作为评判者,发现生成模型合成的假图像 | Ziyou Liang, Run Wang, Weifeng Liu, Yuyang Zhang, Wenyuan Yang, Lina Wang, Xingkai Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework | Make-Your-Anchor:基于扩散的 2D 头像生成框架 | Ziyao Huang, Fan Tang, Yong Zhang, Xiaodong Cun, Juan Cao, Jintao Li, Tong-Yee Lee | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes | 具有面向解剖学成像平面的医学图像数据的自监督学习 | Tianwei Zhang, Dong Wei, Mengmeng Zhua, Shi Gu, Yefeng Zheng | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation | 细化文本到图像的生成:实现准确的免训练字形增强图像生成 | Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Multi-attention Associate Prediction Network for Visual Tracking | 用于视觉跟踪的多注意关联预测网络 | Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Xilai Wei, Zhonghe Hu | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Residual Dense Swin Transformer for Continuous Depth-Independent Ultrasound Imaging | 用于连续深度无关超声成像的残余密集 Swin 变压器 | Jintong Hu, Hui Che, Zishuo Li, Wenming Yang | arxiv.org/pdf/2403.16… | link |
| 2024-03-25 | FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models | FlashEval:快速准确地评估文本到图像的扩散生成模型 | Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | 3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical Image Segmentation | 3D-EffiViTCaps:用于医学图像分割的带胶囊的 3D 高效视觉转换器 | Dongwei Gan, Ming Chang, Juan Chen | arxiv.org/pdf/2403.16… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Elysium: Exploring Object-level Perception in Videos via MLLM | Elysium:通过 MLLM 探索视频中的对象级感知 | Han Wang, Yanjie Wang, Yongjie Ye, Yuxiang Nie, Can Huang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification | CMViM:用于 AD 分类的 3D 多模态表示学习的对比屏蔽 Vim 自动编码器 | Guangqian Yang, Kangrui Du, Zhihan Yang, Ye Du, Yongping Zheng, Shujun Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | PathoTune: Adapting Visual Foundation Model to Pathological Specialists | PathoTune:使视觉基础模型适应病理专家 | Jiaxuan Lu, Fang Yan, Xiaofan Zhang, Yue Gao, Shaoting Zhang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection | RCBEVDet:鸟瞰图中的雷达相机融合用于 3D 物体检测 | Zhiwei Lin, Zhe Liu, Zhongyu Xia, Xinhao Wang, Yongtao Wang, Shengxiang Qi, Yang Dong, Nan Dong, Le Zhang, Ce Zhu | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion | Text-IF:利用语义文本指导进行退化感知和交互式图像融合 | Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA | 逐步综合:工具、模板和 LLM 作为基于推理的图表 VQA 的数据生成器 | Li Zhuowan, Jasani Bhavan, Tang Peng, Ghadar Shabnam | arxiv.org/pdf/2403.16… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Spike-NeRF: Neural Radiance Field Based On Spike Camera | Spike-NeRF:基于 Spike 相机的神经辐射场 | Yijia Guo, Yuanxi Bai, Liwen Hu, Mianzhi Liu, Ziyi Guo, Lei Ma, Tiejun Huang | arxiv.org/pdf/2403.16… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Distilling Semantic Priors from SAM to Efficient Image Restoration Models | 从 SAM 中提取语义先验,形成高效的图像恢复模型 | Quan Zhang, Xiaoyu Liu, Wei Li, Hanting Chen, Junchao Liu, Jie Hu, Zhiwei Xiong, Chun Yuan, Yunhe Wang | arxiv.org/pdf/2403.16… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Assessing the Performance of Deep Learning for Automated Gleason Grading in Prostate Cancer | 评估深度学习在前列腺癌自动格里森分级中的表现 | Dominik Müller, Philip Meyer, Lukas Rentschler, Robin Manz, Daniel Hieber, Jonas Bäcker, Samantha Cramer, Christoph Wengenmayr, Bruno Märkl, Ralf Huss, et.al. | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | DeepGleason: a System for Automated Gleason Grading of Prostate Cancer using Deep Neural Networks | DeepGleason:使用深度神经网络对前列腺癌进行自动格里森分级的系统 | Dominik Müller, Philip Meyer, Lukas Rentschler, Robin Manz, Jonas Bäcker, Samantha Cramer, Christoph Wengenmayr, Bruno Märkl, Ralf Huss, Iñaki Soto-Rey, et.al. | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Domain Adaptive Detection of MAVs: A Benchmark and Noise Suppression Network | MAV 的域自适应检测:基准和噪声抑制网络 | Yin Zhang, Jinhong Deng, Peidong Liu, Wen Li, Shiyu Zhao | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Clustering Propagation for Universal Medical Image Segmentation | 通用医学图像分割的聚类传播 | Yuhang Ding, Liulei Li, Wenguan Wang, Yi Yang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | AI-Generated Video Detection via Spatio-Temporal Anomaly Learning | 通过时空异常学习进行人工智能生成的视频检测 | Jianfa Bai, Man Lin, Gang Cao | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | EDUE: Expert Disagreement-Guided One-Pass Uncertainty Estimation for Medical Image Segmentation | EDUE:专家分歧引导的医学图像分割一次性不确定性估计 | Kudaibergen Abutalip, Numan Saeed, Ikboljon Sobirov, Vincent Andrearczyk, Adrien Depeursinge, Mohammad Yaqub | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | In the Search for Optimal Multi-view Learning Models for Crop Classification with Global Remote Sensing Data | 利用全球遥感数据寻找作物分类的最佳多视图学习模型 | Francisco Mena, Diego Arenas, Andreas Dengel | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging | SegICL:用于增强医学成像分割的通用上下文学习框架 | Lingdong Shen, Fangxin Shang, Yehui Yang, Xiaoshuang Huang, Shining Xiang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Open-Set Recognition in the Age of Vision-Language Models | 视觉语言模型时代的开放集识别 | Dimity Miller, Niko Sünderhauf, Alex Kenna, Keita Mason | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Visually Guided Generative Text-Layout Pre-training for Document Intelligence | 文档智能的视觉引导生成文本布局预训练 | Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | CT-Bound: Fast Boundary Estimation From Noisy Images Via Hybrid Convolution and Transformer Neural Networks | CT-Bound:通过混合卷积和 Transformer 神经网络从噪声图像中进行快速边界估计 | Wei Xu, Junjie Luo, Qi Guo | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Real-time Neuron Segmentation for Voltage Imaging | 用于电压成像的实时神经元分割 | Yosuke Bando, Ramdas Pillai, Atsushi Kajita, Farhan Abdul Hakeem, Yves Quemener, Hua-an Tseng, Kiryl D. Piatkevich, Changyang Linghu, Xue Han, Edward S. Boyden | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding | DOCTR:用于点场景理解的以对象为中心的解缠变压器 | Xiaoxuan Yu, Hao Wang, Weiming Li, Qiang Wang, Soonyong Cho, Younghun Sung | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | 以自我为中心的手部与物体交互的姿势估计的基准和挑战 | Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, et.al. | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Enhancing Visual Place Recognition via Fast and Slow Adaptive Biasing in Event Cameras | 通过事件摄像机中的快速和慢速自适应偏置增强视觉位置识别 | Gokul B. Nair, Michael Milford, Tobias Fischer | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | A Survey on Long Video Generation: Challenges, Methods, and Prospects | 长视频生成综述:挑战、方法与前景 | Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | ASDF:通过集成 6D 姿态估计利用后期融合进行装配状态检测 | Hannah Schieber, Shiyu Li, Niklas Corell, Philipp Beckerle, Julian Kreimeier, Daniel Roth | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation | GoodSAM:通过 Segment Anything 模型弥合域和容量差距,实现失真感知全景语义分割 | Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | ChebMixer: Efficient Graph Representation Learning with MLP Mixer | ChebMixer:使用 MLP 混合器进行高效图表示学习 | Xiaoyan Kui, Haonan Yan, Qinsong Li, Liming Chen, Beiji Zou | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks | 视频压缩伪影对鱼眼相机视觉感知任务的影响 | Madhumitha Sakthi, Louis Kerofsky, Varun Ravi Kumar, Senthil Yogamani | arxiv.org/pdf/2403.16… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Enhancing Industrial Transfer Learning with Style Filter: Cost Reduction and Defect-Focus | 通过风格过滤器增强工业转移学习:降低成本和聚焦缺陷 | Chen Li, Ruijie Ma, Xiang Qian, Xiaohao Wang, Xinghui Li | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion | Elite360D:通过语义和距离感知双投影融合实现高效 360 度深度估计 | Hao Ai, Lin Wang | arxiv.org/pdf/2403.16… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | DOrA: 3D Visual Grounding with Order-Aware Referring | DOrA:具有订单感知参考功能的 3D 视觉基础 | Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Dia-LLaMA: Towards Large Language Model-driven CT Report Generation | Dia-LLaMA:迈向大型语言模型驱动的 CT 报告生成 | Zhixuan Chen, Luyang Luo, Yequan Bie, Hao Chen | arxiv.org/pdf/2403.16… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | QKFormer: Hierarchical Spiking Transformer using Q-K Attention | QKFormer:使用 Q-K Attention 的分层尖峰变压器 | Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting | VMRNN:集成 Vision Mamba 和 LSTM,实现高效准确的时空预测 | Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration | ModeTv2:GPU 加速运动分解变压器,用于医学图像配准中的成对优化 | Haiqiao Wang, Zhuoyuan Wang, Dong Ni, Yi Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Medical Image Registration and Its Application in Retinal Images: A Review | 医学图像配准及其在视网膜图像中的应用:综述 | Qiushi Nie, Xiaoqing Zhang, Yan Hu, Mingdao Gong, Jiang Liu | arxiv.org/pdf/2403.16… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Creating a Digital Twin of Spinal Surgery: A Proof of Concept | 创建脊柱手术的数字孪生:概念证明 | Jonas Hein, Frederic Giraud, Lilian Calvet, Alexander Schwarz, Nicola Alessandro Cavalcanti, Sergey Prokudin, Mazda Farshad, Siyu Tang, Marc Pollefeys, Fabio Carrillo, et.al. | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | V2X-PC: Vehicle-to-everything Collaborative Perception via Point Cluster | V2X-PC:通过点集群实现车对万物的协同感知 | Si Liu, Zihan Ding, Jiahui Fu, Hongyu Li, Siheng Chen, Shifeng Zhang, Xu Zhou | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices | REFRAME:移动设备的反射表面实时渲染 | Chaojie Ji, Yufeng Li, Yiyi Liao | arxiv.org/pdf/2403.16… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | Camera-aware Label Refinement for Unsupervised Person Re-identification | 用于无人监督人员重新识别的相机感知标签细化 | Pengna Li, Kangyi Wu, Wenli Huang, Sanping Zhou, Jinjun Wang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Unsupervised Template-assisted Point Cloud Shape Correspondence Network | 无监督模板辅助点云形状对应网络 | Jiacheng Deng, Jiahao Lu, Tianzhu Zhang | arxiv.org/pdf/2403.16… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-25 | DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization | DPStyler:用于无源域泛化的动态 PromptStyler | Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Synapse: Learning Preferential Concepts from Visual Demonstrations | Synapse:从视觉演示中学习优先概念 | Sadanand Modak, Noah Patton, Isil Dillig, Joydeep Biswas | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | FOOL: Addressing the Downlink Bottleneck in Satellite Computing with Neural Feature Compression | FOOL:利用神经特征压缩解决卫星计算中的下行链路瓶颈 | Alireza Furutanpey, Qiyang Zhang, Philipp Raith, Tobias Pfandzelter, Shangguang Wang, Schahram Dustdar | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution | 自适应现实引导扩散,实现无伪影超分辨率 | Qingping Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting | 校准贝叶斯 UNet++ 以进行次季节预测 | Busra Asan, Abdullah Akgul, Alper Unal, Melih Kandemir, Gozde Unal | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors | 揭示神经网络在参数学习和防御解释感知后门方面的漏洞 | Md Abdul Kadir, GowthamKrishna Addluri, Daniel Sonntag | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions | 如果 CLIP 会说话:通过首选概念描述理解视觉语言模型表示 | Reza Esfandiarpoor, Cristina Menghini, Stephen H. Bach | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Producing and Leveraging Online Map Uncertainty in Trajectory Prediction | 轨迹预测中在线地图不确定性的产生和利用 | Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Ensemble Adversarial Defense via Integration of Multiple Dispersed Low Curvature Models | 通过集成多个分散的低曲率模型进行整体对抗防御 | Kaikang Zhao, Xi Chen, Wei Huang, Liuxin Ding, Xianglong Kong, Fan Zhang | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion | 通过引导扩散从头开始生成强效毒药和后门 | Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions | RSTAR:使用可分离和循环卷积减少 4D CBCT 中的旋转条纹伪影 | Ziheng Deng, Hua Chen, Haibo Hu, Zhiyong Xu, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao | arxiv.org/pdf/2403.16… | null |
| 2024-03-25 | MEDDAP: Medical Dataset Enhancement via Diversified Augmentation Pipeline | MEDDAP:通过多样化的增强管道增强医疗数据集 | Yasamin Medghalchi, Niloufar Zakariaei, Arman Rahmim, Ilker Hacihaliloglu | arxiv.org/pdf/2403.16… | null |