[UPDATED!] 2024-02-05 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | Do Diffusion Models Learn Semantically Meaningful and Efficient Representations? | 扩散模型是否能够学习语义上有意义且有效的表示? | Qiyao Liang, Ziming Liu, Ila Fiete | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models | GUARD:通过角色扮演生成自然语言越狱,以测试大型语言模型的准则遵守情况 | Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wang | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Zero-shot Object-Level OOD Detection with Context-Aware Inpainting | 具有上下文感知修复功能的零样本对象级 OOD 检测 | Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Dung D. Le | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | InstanceDiffusion: Instance-level Control for Image Generation | InstanceDiffusion:图像生成的实例级控制 | Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images | IGUANe:用于大脑 MR 图像多中心协调的 3D 通用 CycleGAN | Vincent Roca, Grégory Kuchcinski, Jean-Pierre Pruvo, Dorian Manouvriez, Renaud Lopes | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? | 有机还是扩散:我们可以区分人类艺术和人工智能生成的图像吗? | Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, Ben Y. Zhao | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion | Direct-a-Video:通过用户控制的摄像机移动和对象运动生成定制视频 | Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics | 超越对抗性扰动:具有合法语义的多方面辅助对抗性示例 | Shuai Li, Xiaoyu Jiang, Xiaoguang Ma | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing | 视觉文本满足低级视觉:视觉文本处理的综合调查 | Yan Shu, Weichao Zeng, Zhenhang Li, Fangmin Zhao, Yu Zhou | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | PFDM: Parser-Free Virtual Try-on via Diffusion Model | PFDM:通过扩散模型进行无解析器虚拟试戴 | Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, Jinqiao Wang | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions | InteractiveVideo:具有协同多模式指令的以用户为中心的可控视频生成 | Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Retrieval-Augmented Score Distillation for Text-to-3D Generation | 用于文本转 3D 生成的检索增强分数蒸馏 | Junyoung Seo, Susung Hong, Wooseok Jang, Inès Hyeonsu Kim, Minseop Kwak, Doyup Lee, Seungryong Kim | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Instance Segmentation XXL-CT Challenge of a Historic Airplane | 历史飞机的实例分割 XXL-CT 挑战 | Roland Gruber, Johann Christopher Engster, Markus Michen, Nele Blum, Maik Stille, Stefan Gerth, Thomas Wittenberg | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis | ViewFusion:学习用于新视图合成的可组合扩散模型 | Bernard Spiegl, Andrea Perin, Stéphane Deny, Alexander Ilin | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | SynthVision - Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data | SynthVision - 使用合成图像数据在计算机视觉模型中利用最小输入获得最大输出 | Yudara Kularathne, Prathapa Janitha, Sithira Ambepitiya, Thanveer Ahamed, Dinuka Wijesundara, Prarththanan Sothyrajah | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Extreme Two-View Geometry From Object Poses with Diffusion Models | 具有扩散模型的物体姿势的极端二视图几何 | Yujing Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiu | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models | DisDet:探索扩散模型后门攻击的可检测性 | Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data | InVA:用于协调多模态神经影像数据的综合变分自动编码器 | Bowen Lei, Rajarshi Guhaniyogi, Krishnendu Chandra, Aaron Scheffler, Bani Mallick | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Fast and Accurate Cooperative Radio Map Estimation Enabled by GAN | GAN 支持快速准确的协作无线电地图估计 | Zezhong Zhang, Guangxu Zhu, Junting Chen, Shuguang Cui | arxiv.org/pdf/2402.02… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion | AONeuS:声光传感器融合的神经渲染框架 | Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection | ActiveAnno3D - 用于多模态 3D 对象检测的主动学习框架 | Ahmed Ghita, Bjørk Antoniussen, Walter Zimmer, Ross Greer, Christian Creß, Andreas Møgelmose, Mohan M. Trivedi, Alois C. Knoll | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Multi: Multimodal Understanding Leaderboard with Text and Images | 多:带有文本和图像的多模式理解排行榜 | Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, et.al. | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | Video-LaVIT:具有解耦视觉运动标记化的统一视频语言预训练 | Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, et.al. | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Text-Guided Image Clustering | 文本引导图像聚类 | Andreas Stephan, Lukas Miklautz, Kevin Sidak, Jan Philip Wahle, Bela Gipp, Claudia Plant, Benjamin Roth | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives | 深入研究道路场景理解的多模态多任务基础模型:从学习范式的角度 | Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, et.al. | arxiv.org/pdf/2402.02… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes | 4D 高斯泼溅:实现动态场景的高效新颖视图合成 | Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | SGS-SLAM:神经密集 SLAM 的语义高斯泼溅 | Mingrui Li, Shuhong Liu, Heng Zhou | arxiv.org/pdf/2402.03… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition | FROSTER:Frozen CLIP 是开放词汇动作识别的强大老师 | Xiaohu Huang, Hao Zhou, Kun Yao, Kai Han | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Good Teachers Explain: Explanation-Enhanced Knowledge Distillation | 好老师讲解:讲解增强知识蒸馏 | Amin Parchami-Araghi, Moritz Böhle, Sukrut Rao, Bernt Schiele | arxiv.org/pdf/2402.03… | link |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | HASSOD: Hierarchical Adaptive Self-Supervised Object Detection | HASSOD:分层自适应自监督目标检测 | Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining | Swin-UMamba:基于 Mamba 的 UNet 和基于 ImageNet 的预训练 | Jiarun Liu, Hao Yang, Hong-Yu Zhou, Yan Xi, Lequan Yu, Yizhou Yu, Yong Liang, Guangming Shi, Shaoting Zhang, Hairong Zheng, et.al. | arxiv.org/pdf/2402.03… | link |
| 2024-02-05 | CT-based Anatomical Segmentation for Thoracic Surgical Planning: A Benchmark Study for 3D U-shaped Deep Learning Models | 基于 CT 的胸部手术规划解剖分割:3D U 形深度学习模型的基准研究 | Arash Harirpoush, Amirhossein Rasoulian, Marta Kersten-Oertel, Yiming Xiao | arxiv.org/pdf/2402.03… | link |
| 2024-02-05 | Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms | 通过以凝视为中心的损失项来减轻面部交换中的不可思议(眼睛) | Ethan Wilson, Frederick Shic, Sophie Jörg, Eakta Jain | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | RRWNet: Recursive Refinement Network for Effective Retinal Artery/Vein Segmentation and Classification | RRWNet:用于有效视网膜动脉/静脉分割和分类的递归细化网络 | José Morano, Guilherme Aresta, Hrvoje Bogunović | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Towards Eliminating Hard Label Constraints in Gradient Inversion Attacks | 消除梯度反转攻击中的硬标签约束 | Yanbo Wang, Jian Liang, Ran He | arxiv.org/pdf/2402.03… | link |
| 2024-02-05 | Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector | 通过增强型开放集对象检测器进行跨域少样本对象检测 | Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Lingjie Kong, Yanwei Fu, Luc Van Gool, et.al. | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Taylor Videos for Action Recognition | 用于动作识别的泰勒视频 | Lei Wang, Xiuyuan Yuan, Tom Gedeon, Liang Zheng | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | [Citation needed] Data usage and citation practices in medical imaging conferences | [需要引用]医学影像会议中的数据使用和引用实践 | Théo Sourget, Ahmet Akkoç, Stinna Winther, Christine Lyngbye Galsgaard, Amelia Jiménez-Sánchez, Dovile Juodelyte, Caroline Petitjean, Veronika Cheplygina | arxiv.org/pdf/2402.03… | link |
| 2024-02-05 | A Safety-Adapted Loss for Pedestrian Detection in Automated Driving | 自动驾驶中行人检测的安全自适应损失 | Maria Lyssenko, Piyush Pimplikar, Maarten Bieshaar, Farzad Nozarian, Rudolph Triebel | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing | 用于道路场景解析的高分辨率无人机图像的无监督语义分割 | Zihan Ma, Yongshang Li, Ronggui Ma, Chen Liang | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | One-class anomaly detection through color-to-thermal AI for building envelope inspection | 通过颜色到热人工智能进行一级异常检测,用于建筑围护结构检查 | Polina Kurtser, Kailun Feng, Thomas Olofsson, Aitor De Andres | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | HoughToRadon Transform: New Neural Network Layer for Features Improvement in Projection Space | HoughToRadon 变换:用于投影空间特征改进的新神经网络层 | Alexandra Zhabitskaya, Alexander Sheshkus, Vladimir L. Arlazarov | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Time-, Memory- and Parameter-Efficient Visual Adaptation | 时间、内存和参数高效的视觉适应 | Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Multi-scale fMRI time series analysis for understanding neurodegeneration in MCI | 多尺度功能磁共振成像时间序列分析用于了解 MCI 中的神经退行性变 | Ammu R., Debanjali Bhattacharya, Ameiy Acharya, Ninad Aithal, Neelam Sinha | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects | 用于表面缺陷显着性检测的联合注意力引导特征融合网络 | Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Transmission Line Detection Based on Improved Hough Transform | 基于改进Hough变换的输电线路检测 | Wei Song, Pei Li, Man Wang | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective | 从融合策略的角度提高激光雷达-相机融合模型对抗天气腐蚀的鲁棒性 | Yihao Huang, Kaiyuan Yu, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Tianlin Li, Geguang Pu, Yang Liu | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | FDNet: Frequency Domain Denoising Network For Cell Segmentation in Astrocytes Derived From Induced Pluripotent Stem Cells | FDNet:用于诱导多能干细胞衍生的星形胶质细胞分割的频域去噪网络 | Haoran Li, Jiahua Shi, Huaming Chen, Bo Du, Simon Maksour, Gabrielle Phillips, Mirella Dottori, Jun Shen | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Image-Caption Encoding for Improving Zero-Shot Generalization | 用于改进零样本泛化的图像标题编码 | Eric Yang Yu, Christopher Liao, Sathvik Ravi, Theodoros Tsiligkaridis, Brian Kulis | arxiv.org/pdf/2402.02… | link |
| 2024-02-05 | Learning with Mixture of Prototypes for Out-of-Distribution Detection | 混合原型学习以进行分布外检测 | Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, Kristen Moore | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation | 用于医学图像分割的具有自适应深度监督的密集解码网络 | Suraj Mishra | arxiv.org/pdf/2402.02… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | CLIP Can Understand Depth | CLIP 可以理解深度 | Dunam Kim, Seokju Lee | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Perceptual Learned Image Compression via End-to-End JND-Based Optimization | 通过基于 JND 的端到端优化进行感知学习图像压缩 | Farhad Pakdaman, Sanaz Nami, Moncef Gabbouj | arxiv.org/pdf/2402.02… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | Training-Free Consistent Text-to-Image Generation | 免训练一致的文本到图像生成 | Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmon | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image | AdaTreeFormer:从单个高分辨率图像进行树木计数的少量镜头域适应 | Hamed Amini Amirkolaee, Miaojing Shi, Lianghua He, Mark Mulligan | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey | 探索计算机视觉混合 CNN 和 ViT 架构的协同作用:一项调查 | Haruna Yunusa, Shiyin Qin, Abdulrahman Hamman Adama Chukkol, Abdulganiyu Abdu Yusuf, Isah Bello, Adamu Lawan | arxiv.org/pdf/2402.02… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | GPU-Accelerated 3D Polygon Visibility Volumes for Synergistic Perception and Navigation | GPU 加速的 3D 多边形可见体积可实现协同感知和导航 | Andrew Willis, Collin Hague, Artur Wolek, Kevin Brink | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | AI-Enhanced Virtual Reality in Medicine: A Comprehensive Survey | 人工智能增强虚拟现实在医学中的应用:综合调查 | Yixuan Wu, Kaiyuan Hu, Danny Z. Chen, Jian Wu | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Motion-Aware Video Frame Interpolation | 运动感知视频帧插值 | Pengfei Han, Fuhua Zhang, Bin Zhao, Xuelong Li | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer | ToonAging:艺术肖像风格迁移下的面部再老化 | Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo | arxiv.org/pdf/2402.02… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | Representation Surgery for Multi-Task Model Merging | 多任务模型合并的表示手术 | Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, Dacheng Tao | arxiv.org/pdf/2402.02… | link |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-05 | Test-Time Adaptation for Depth Completion | 深度完成的测试时间调整 | Hyoungseob Park, Anjali Gupta, Alex Wong | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | V-IRL: Grounding Virtual Intelligence in Real Life | V-IRL:将虚拟智能融入现实生活 | Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Towards a Flexible Scale-out Framework for Efficient Visual Data Query Processing | 面向高效可视数据查询处理的灵活横向扩展框架 | Rohit Verma, Arun Raghunath | arxiv.org/pdf/2402.03… | null |
| 2024-02-05 | Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss | 使用门控卷积和上下文重建损失的全景图像修复 | Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabbouj | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Pixel-Wise Color Constancy via Smoothness Techniques in Multi-Illuminant Scenes | 通过多光源场景中的平滑技术实现逐像素颜色恒定 | Umut Cem Entok, Firas Laakom, Farhad Pakdaman, Moncef Gabbouj | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding | 探索通用音频理解的联合自监督学习 | Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Time-Distributed Backdoor Attacks on Federated Spiking Learning | 对联邦尖峰学习的时间分布式后门攻击 | Gorka Abad, Stjepan Picek, Aitor Urbieta | arxiv.org/pdf/2402.02… | null |
| 2024-02-05 | Enhancing Compositional Generalization via Compositional Feature Alignment | 通过组合特征对齐增强组合泛化 | Haoxiang Wang, Haozhe Si, Huajie Shao, Han Zhao | arxiv.org/pdf/2402.02… | link |
| 2024-02-05 | Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes | 使用运动提示来监督低数据状态下的单帧身体姿势和形状估计 | Andrey Davydov, Alexey Sidnev, Artsiom Sanakoyeu, Yuhua Chen, Mathieu Salzmann, Pascal Fua | arxiv.org/pdf/2402.02… | null |