[UPDATED!] 2024-02-21 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | Geometry-Informed Neural Networks | 几何信息神经网络 | Arturs Berzins, Andreas Radler, Sebastian Sanokowski, Sepp Hochreiter, Johannes Brandstetter | arxiv.org/pdf/2402.14… | null |
| 2024-02-21 | Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning | 独特的图像字幕:在 CLIP 引导强化学习中利用真实字幕 | Antoine Chaffin, Ewa Kijak, Vincent Claveau | arxiv.org/pdf/2402.13… | link |
| 2024-02-21 | Tumor segmentation on whole slide images: training or prompting? | 整个幻灯片图像上的肿瘤分割:训练还是提示? | Huaqian Wu, Clara Brémond-Martin, Kévin Bouaou, Cédric Clouchoux | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | NeuralDiffuser: Controllable fMRI Reconstruction with Primary Visual Feature Guided Diffusion | NeuralDiffuser:具有主要视觉特征引导扩散的可控 fMRI 重建 | Haoyu Li, Hao Wu, Badong Chen | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Scalable Methods for Brick Kiln Detection and Compliance Monitoring from Satellite Imagery: A Deployment Case Study in India | 利用卫星图像进行砖窑检测和合规性监测的可扩展方法:印度的部署案例研究 | Rishabh Mondal, Zeel B Patel, Vannsh Jani, Nipun Batra | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion | Cas-DiffCom:用于婴儿纵向超分辨率 3D 医学图像补全的级联扩散模型 | Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, et.al. | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | SRNDiff: Short-term Rainfall Nowcasting with Condition Diffusion Model | SRNDiff:使用条件扩散模型进行短期降雨临近预报 | Xudong Ling, Chaorong Li, Fengqing Qin, Peng Yang, Yuanyuan Huang | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation | 具有 2D 三平面和 3D 小波表示的混合视频扩散模型 | Kihong Kim, Haneol Lee, Jihye Park, Seyeon Kim, Kwanghee Lee, Seungryong Kim, Jaejun Yoo | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Flexible Physical Camouflage Generation Based on a Differential Approach | 基于差分方法的灵活物理伪装生成 | Yang Li, Wenyi Tan, Chenxing Zhao, Shuangju Zhou, Xinkai Liang, Quan Pan | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | ToDo: Token Downsampling for Efficient Generation of High-Resolution Images | ToDo:令牌下采样以高效生成高分辨率图像 | Ethan Smith, Nayan Saxena, Aninda Saha | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models | 对比提示可改善文本到图像扩散模型中的解缠结 | Chen Wu, Fernando De la Torre | arxiv.org/pdf/2402.13… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | Scene Prior Filtering for Depth Map Super-Resolution | 用于深度图超分辨率的场景先验过滤 | Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, Guangwei Gao | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models | VL-Trojan:针对自回归视觉语言模型的多模式指令后门攻击 | Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Cao | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models | CODIS:多模态大语言模型的上下文相关视觉理解基准测试 | Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, et.al. | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation | 用于电子商务产品描述生成的多模式上下文调整方法 | Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement | 通过部分相关性增强改进视频语料库时刻检索 | Danyang Hou, Liang Pang, Huawei Shen, Xueqi Cheng | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment | 认知视觉语言映射器:通过增强的视觉知识对齐促进多模式理解 | Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang | arxiv.org/pdf/2402.13… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting | 使用聚类识别不必要的 3D 高斯分布以快速渲染 3D 高斯分布 | Joongho Jo, Hyeongwon Kim, Jongsun Park | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance Fields | SealD-NeRF:通过神经辐射场对动态场景进行交互式像素级编辑 | Zhentao Huang, Yukun Shi, Neil Bruce, Minglun Gong | arxiv.org/pdf/2402.13… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | SDXL-Lightning: Progressive Adversarial Diffusion Distillation | SDXL-Lightning:渐进式对抗扩散蒸馏 | Shanchuan Lin, Anran Wang, Xiao Yang | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification | MSTAR:用于时间序列分类的多尺度骨干架构搜索 | Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyen | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Push Quantization-Aware Training Toward Full Precision Performances via Consistency Regularization | 通过一致性正则化将量化感知训练推向全精度性能 | Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu, Ye Tao | arxiv.org/pdf/2402.13… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | BEE-NET: A deep neural network to identify in-the-wild Bodily Expression of Emotions | BEE-NET:一种深度神经网络,用于识别野外身体情绪表达 | Mohammad Mahdi Dehshibi, David Masip | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery | BenchCloudVision:遥感图像云检测和分割深度学习方法的基准分析 | Loddo Fabio, Dario Piga, Michelucci Umberto, El Ghazouali Safouane | arxiv.org/pdf/2402.13… | link |
| 2024-02-21 | Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps | 零 BEV:任何第一人称模式到 BEV 地图的零镜头投影 | Gianluca Monaci, Leonid Antsfeld, Boris Chidlovskii, Christian Wolf | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Weakly supervised localisation of prostate cancer using reinforcement learning for bi-parametric MR images | 使用双参数 MR 图像的强化学习对前列腺癌进行弱监督定位 | Martynas Pocius, Wen Yan, Dean C. Barratt, Mark Emberton, Matthew J. Clarkson, Yipeng Hu, Shaheer U. Saeed | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Mask-up: Investigating Biases in Face Re-identification for Masked Faces | 蒙面:调查蒙面人脸重新识别中的偏差 | Siddharth D Jaiswal, Ankit Kr. Verma, Animesh Mukherjee | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Explainable Classification Techniques for Quantum Dot Device Measurements | 量子点器件测量的可解释分类技术 | Daniel Schug, Tyler J. Kovach, M. A. Wolfe, Jared Benson, Sanghyeok Park, J. P. Dodson, J. Corrigan, M. A. Eriksson, Justyna P. Zwolak | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation | 用于零样本全景和语义分割的可泛化语义视觉查询生成 | Jialei Chen, Daisuke Deguchi, Chenkai Zhang, Hiroshi Murase | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Robustness of Deep Neural Networks for Micro-Doppler Radar Classification | 微多普勒雷达分类深度神经网络的鲁棒性 | Mikolaj Czerkawski, Carmine Clemente, Craig MichieCraig Michie, Christos Tachtatzis | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition | 用于场景文本识别的类感知掩模引导特征细化 | Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai | arxiv.org/pdf/2402.13… | link |
| 2024-02-21 | Delving into Dark Regions for Robust Shadow Detection | 深入研究黑暗区域以实现稳健的阴影检测 | Huankang Guan, Ke Xu, Rynson W. H. Lau | arxiv.org/pdf/2402.13… | link |
| 2024-02-21 | YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information | YOLOv9:使用可编程梯度信息学习您想学习的内容 | Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao | arxiv.org/pdf/2402.13… | link |
| 2024-02-21 | Learning Pixel-wise Continuous Depth Representation via Clustering for Depth Completion | 通过深度补全的聚类学习逐像素连续深度表示 | Chen Shenglun, Zhang Hong, Ma XinZhu, Wang Zhihui, Li Haojie | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | TransGOP: Transformer-Based Gaze Object Prediction | TransGOP:基于 Transformer 的注视对象预测 | Binglu Wang, Chenxi Guo, Yang Jin, Haisheng Xia, Nian Liu | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | A Two-Stage Dual-Path Framework for Text Tampering Detection and Recognition | 用于文本篡改检测和识别的两阶段双路径框架 | Guandong Li, Xian Yang, Wenpin Ma | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Unsupervised learning based object detection using Contrastive Learning | 使用对比学习的基于无监督学习的对象检测 | Chandan Kumar, Jansel Herrera-Gerena, John Just, Matthew Darr, Ali Jannesari | arxiv.org/pdf/2402.13… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving | 基于大语言模型的自动驾驶混合推理 | Mehdi Azarafza, Mojtaba Nayyeri, Charles Steinmetz, Steffen Staab, Achim Rettberg | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs | 法学硕士遇见长视频:利用法学硕士中的交互式视觉适配器促进长视频理解 | Yunxin Li, Xinyu Chen, Baotain Hu, Min Zhang | arxiv.org/pdf/2402.13… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks | VOOM:使用分层地标的鲁棒视觉对象里程计和绘图 | Yutong Wang, Chaoyang Jiang, Xieyuanli Chen | arxiv.org/pdf/2402.13… | link |
| 2024-02-21 | Event-aware Video Corpus Moment Retrieval | 事件感知视频语料库时刻检索 | Danyang Hou, Liang Pang, Huawei Shen, Xueqi Cheng | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization | EffLoc:用于高效 6 自由度相机重定位的轻量级视觉转换器 | Zhendong Xiao, Changhao Chen, Shan Yang, Wu Wei | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images | 基于多尺度时空变换器的不平衡纵向学习用于不规则时间序列图像的青光眼预测 | Xikai Yang, Jian Wu, Xi Wang, Yuchen Yuan, Ning Li Wang, Pheng-Ann Heng | arxiv.org/pdf/2402.13… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | Real-time 3D-aware Portrait Editing from a Single Image | 从单个图像进行实时 3D 感知肖像编辑 | Qingyan Bai, Yinghao Xu, Zifan Shi, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen | arxiv.org/pdf/2402.14… | null |
| 2024-02-21 | A unified framework of non-local parametric methods for image denoising | 图像去噪非局部参数方法的统一框架 | Sébastien Herbreteau, Charles Kervrann | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters | 自带角色:自动生成自定义角色面部动画的整体解决方案 | Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning | SimPro:实现现实长尾半监督学习的简单概率框架 | Chaoqun Du, Yizeng Han, Gao Huang | arxiv.org/pdf/2402.13… | link |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-21 | Corrective Machine Unlearning | 纠正机器遗忘 | Shashwat Goel, Ameya Prabhu, Philip Torr, Ponnurangam Kumaraguru, Amartya Sanyal | arxiv.org/pdf/2402.14… | null |
| 2024-02-21 | High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks | 使用机载全卷积网络进行高通量视觉纳米无人机到纳米无人机的相对定位 | Luca Crupi, Alessandro Giusti, Daniele Palossi | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | A Unified Framework and Dataset for Assessing Gender Bias in Vision-Language Models | 用于评估视觉语言模型中性别偏见的统一框架和数据集 | Ashutosh Sathe, Prachi Jain, Sunayana Sitaram | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Adversarial Purification and Fine-tuning for Robust UDC Image Restoration | 用于鲁棒 UDC 图像恢复的对抗性净化和微调 | Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel | 探索每像素微比特语义图像压缩的极限 | Jordan Dotzel, Bahaa Kotb, James Dotzel, Mohamed Abdelfattah, Zhiru Zhang | arxiv.org/pdf/2402.13… | null |
| 2024-02-21 | A Feature Matching Method Based on Multi-Level Refinement Strategy | 一种基于多级细化策略的特征匹配方法 | Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Jinlong Yang, Tao Yan, Liangyi Huang, Mingfeng Wang | arxiv.org/pdf/2402.13… | null |