[UPDATED!] 2024-02-13 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Learning Continuous 3D Words for Text-to-Image Generation | 学习连续 3D 单词以生成文本到图像 | Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Denoising Diffusion Restoration Tackles Forward and Inverse Problems for the Laplace Operator | 去噪扩散恢复解决拉普拉斯算子的正向和逆向问题 | Amartya Mukherjee, Melissa M. Stadt, Lena Podina, Mohammad Kohandel, Jun Liu | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases | 面对扩散模型的奖励过度优化:归纳偏差和首要偏差的视角 | Ziyi Zhang, Sen Zhang, Yibing Zhan, Yong Luo, Yonggang Wen, Dacheng Tao | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence | 认真对待培训:人工智能的人为指导和管理调控 | Cary Coglianese, Colton R. Crum | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | One-to-many Reconstruction of 3D Geometry of cultural Artifacts using a synthetically trained Generative Model | 使用综合训练的生成模型一对多重建文化文物的 3D 几何形状 | Thomas Pöllabauer, Julius Kühn, Jiayi Li, Arjan Kuijper | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | A Dense Reward View on Aligning Text-to-Image Diffusion with Preference | 关于将文本到图像扩散与偏好对齐的密集奖励视图 | Shentao Yang, Tianqi Chen, Mingyuan Zhou | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation | 微调文本到图像的扩散模型以生成按类别的杂散特征 | AprilPyone MaungMaung, Huy H. Nguyen, Hitoshi Kiya, Isao Echizen | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Poisson flow consistency models for low-dose CT image denoising | 低剂量CT图像去噪的泊松流一致性模型 | Dennis Hein, Adam Wang, Ge Wang | arxiv.org/pdf/2402.08… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs | PIN:位置插入解锁 VLM 中的对象定位能力 | Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Test-Time Backdoor Attacks on Multimodal Large Language Models | 对多模态大型语言模型的测试时后门攻击 | Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | 史密斯特工:单张图像可以以指数速度越狱一百万多模式 LLM 特工 | Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | 视觉问答教学:将多模态大语言模型解锁到特定领域的视觉多任务 | Jusung Lee, Sungguk Cha, Younghyun Lee, Cheoljong Yang | arxiv.org/pdf/2402.08… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs | NeRF 类比:NeRF 基于示例的视觉属性传输 | Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc, Aljaz Bozic, Zhao Dong, Carl Marshall, Tobias Ritschel | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields | H2O-SDF:使用物体表面场进行 3D 室内重建的两阶段学习 | Minyoung Park, Mirae Do, YeonJae Shin, Jaeseok Yoo, Jongkwang Hong, Joongrock Kim, Chul Lee | arxiv.org/pdf/2402.08… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation | IM-3D:用于高质量 3D 生成的迭代多视图扩散和重建 | Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos | arxiv.org/pdf/2402.08… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | BdSLW60: A Word-Level Bangla Sign Language Dataset | BdSLW60:单词级孟加拉手语数据集 | Husne Ara Rubaiyeat, Hasan Mahmud, Ahsan Habib, Md. Kamrul Hasan | arxiv.org/pdf/2402.08… | link |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Convolutional Neural Networks Towards Facial Skin Lesions Detection | 卷积神经网络用于面部皮肤病变检测 | Reza Sarshar, Mohammad Heydari, Elham Akhondzadeh Noughabi | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | FESS Loss: Feature-Enhanced Spatial Segmentation Loss for Optimizing Medical Image Analysis | FESS 损失:用于优化医学图像分析的特征增强空间分割损失 | Charulkumar Chodvadiya, Navyansh Mahla, Kinshuk Gaurav Singh, Kshitij Sharad Jadhav | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Glass Segmentation with Multi Scales and Primary Prediction Guiding | 多尺度玻璃分割和初步预测引导 | Zhiyu Xu, Qingliang Chen | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Approximately Piecewise E(3) Equivariant Point Networks | 近似分段 E(3) 等变点网络 | Matan Atzmon, Jiahui Huang, Francis Williams, Or Litany | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | P-Mamba: Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation | P-Mamba:将 Perona Malik Diffusion 与 Mamba 结合起来,实现高效的儿科超声心动图左心室分割 | Zi Ye, Tianxiang Chen | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models | 视觉语言 Transformer 模型的零样本评估和系统评估之间的有趣差异 | Shaeke Salman, Md Montasir Bin Shams, Xiuwen Liu, Lingjiong Zhu | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Latent space configuration for improved generalization in supervised autoencoder neural networks | 用于改进监督自动编码器神经网络泛化的潜在空间配置 | Nikita Gabdullin | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Camera Calibration through Geometric Constraints from Rotation and Projection Matrices | 通过旋转和投影矩阵的几何约束进行相机校准 | Muhammad Waleed, Abdul Rauf, Murtaza Taj | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection | 利用自监督实例对比学习进行雷达目标检测 | Colin Decourt, Rufin VanRullen, Didier Salle, Thomas Oberlin | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Transferring Ultrahigh-Field Representations for Intensity-Guided Brain Segmentation of Low-Field Magnetic Resonance Imaging | 传输超高场表示以进行低场磁共振成像强度引导脑分割 | Kwanseok Oh, Jieun Lee, Da-Woon Heo, Dinggang Shen, Heung-Il Suk | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing | 使用随机平滑进行分段的自适应分层认证 | Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Visually Dehallucinative Instruction Generation | 视觉幻觉指令生成 | Sungguk Cha, Jusung Lee, Younghyun Lee, Cheoljong Yang | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Conditional Information Gain Trellis | 条件信息增益网格 | Ufuk Can Bicici, Tuna Han Salih Meral, Lale Akarun | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Scribble-based fast weak-supervision and interactive corrections for segmenting whole slide images | 基于涂鸦的快速弱监督和交互式校正,用于分割整个幻灯片图像 | Antoine Habis, Roy Rosman Nathanson, Vannary Meas-Yedid, Elsa D. Angelini, Jean-Christophe Olivo-Marin | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models | 运动悖论:基于骨骼的步态识别模型中虚假相关性的证据 | Andy Cătrună, Adrian Cosma, Emilian Rădoi | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Rethinking U-net Skip Connections for Biomedical Image Segmentation | 重新思考用于生物医学图像分割的 U-net Skip Connections | Frauke Wilm, Jonas Ammeling, Mathias Öttl, Rutger H. J. Fick, Marc Aubreville, Katharina Breininger | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss | 通过辅助损失优化编码器来改进机器的图像编码 | Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira Sakamoto, Takeharu Eda | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles | 使用无人机深度学习进行热图像中的物体检测 | Minh Dang Tu, Kieu Trang Le, Manh Duong Phung | arxiv.org/pdf/2402.08… | null |
GNN
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Pix2Code: Learning to Compose Neural Visual Concepts as Programs | Pix2Code:学习将神经视觉概念编写为程序 | Antonia Wüst, Wolfgang Stammer, Quentin Delfosse, Devendra Singh Dhami, Kristian Kersting | arxiv.org/pdf/2402.08… | link |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance | 通过无分类器指导减轻大视觉语言模型中的物体幻觉 | Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu | arxiv.org/pdf/2402.08… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Are Semi-Dense Detector-Free Methods Good at Matching Local Features? | 半密集无检测器方法是否擅长匹配局部特征? | Matthieu Vilain, Rémi Giraud, Hugo Germain, Guillaume Bourmaud | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Peeking Behind the Curtains of Residual Learning | 窥视残差学习的幕后 | Tunhou Zhang, Feng Yan, Hai Li, Yiran Chen | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild | CrossGaze:野外 3D 视线估计的强大方法 | Andy Cătrună, Adrian Cosma, Emilian Rădoi | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | MetaTra: Meta-Learning for Generalized Trajectory Prediction in Unseen Domain | MetaTra:用于未知领域广义轨迹预测的元学习 | Xiaohe Li, Feilong Huang, Zide Fan, Fangli Mou, Yingyan Hou, Chen Qian, Lijie Wen | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Translating Images to Road Network:A Non-Autoregressive Sequence-to-Sequence Approach | 将图像转换为道路网络:一种非自回归序列到序列方法 | Jiachen Lu, Renyuan Peng, Xinyue Cai, Hang Xu, Hongyang Li, Feng Wen, Wei Zhang, Li Zhang | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Optimized Information Flow for Transformer Tracking | 变压器跟踪的优化信息流 | Janani Kugarajeevan, Thanikasalam Kokul, Amirthalingam Ramanan, Subha Fernando | arxiv.org/pdf/2402.08… | link |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | 学习为视觉定位生成半密集对应 | Khang Truong Giang, Soohwan Song, Sungho Jo | arxiv.org/pdf/2402.08… | link |
| 2024-02-13 | Color Image Denoising Using The Green Channel Prior | 使用绿色通道先验进行彩色图像去噪 | Zhaoming Kong, Xiaowei Yang | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5 | 推进数据驱动的天气预报:ERA5 的时间滑动数据增强 | Minjong Cheon, Daehyun Kang, Yo-Hwan Choi, Seon-Yu Kang | arxiv.org/pdf/2402.08… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Pixel Sentence Representation Learning | 像素句子表示学习 | Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li, Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed | arxiv.org/pdf/2402.08… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-13 | Learned Image Compression with Text Quality Enhancement | 学习图像压缩和文本质量增强 | Chih-Yu Lai, Dung Tran, Kazuhito Koishida | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing | 具有时间步感知采样的潜在反转,用于免训练非刚性编辑 | Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jongchul Ye | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | JeFaPaTo -- A joint toolbox for blinking analysis and facial features extraction | JeFaPaTo——眨眼分析和面部特征提取的联合工具箱 | Tim Büchner, Oliver Mothes, Orlando Guntinas-Lichius, Joachim Denzler | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | A Neural-network Enhanced Video Coding Framework beyond ECM | 超越 ECM 的神经网络增强视频编码框架 | Yanchen Zhao, Wenxuan He, Chuanmin Jia, Qizhe Wang, Junru Li, Yue Li, Chaoyi Lin, Kai Zhang, Li Zhang, Siwei Ma | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation | 用于审美感知音乐推荐的顺序复杂度审美评估模型 | Xin Jin, Wu Zhou, Jingyu Wang, Duo Xu, Yongsen Zheng | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | Learning semantic image quality for fetal ultrasound from noisy ranking annotation | 从嘈杂的排名注释中学习胎儿超声的语义图像质量 | Manxi Lin, Jakob Ambsdorf, Emilie Pi Fogtmann Sejer, Zahra Bashir, Chun Kit Wong, Paraskevas Pegios, Alberto Raheli, Morten Bo Søndergaard Svendsen, Mads Nielsen, Martin Grønnebæk Tolsgaard, et.al. | arxiv.org/pdf/2402.08… | null |
| 2024-02-13 | SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization | SepRep-Net:通过模型分离和重新参数化进行多源自由域适应 | Ying Jin, Jiaqi Wang, Dahua Lin | arxiv.org/pdf/2402.08… | null |