[UPDATED!] 2024-02-20 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies | AnnoTheia:用于视听语音技术的半自动注释工具包 | José-M. Acosta-Triana, David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos | arxiv.org/pdf/2402.13… | link |
| 2024-02-20 | Neural Network Diffusion | 神经网络扩散 | Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You | arxiv.org/pdf/2402.13… | link |
| 2024-02-20 | VGMShield: Mitigating Misuse of Video Generative Models | VGMShield:减少视频生成模型的滥用 | Yan Pang, Yang Zhang, Tianhao Wang | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Visual Style Prompting with Swapping Self-Attention | 通过交换自我注意力来提示视觉风格 | Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence | 模式分析与机器智能文献综述 | Penghai Zhao, Xin Zhang, Ming-Ming Cheng, Jian Yang, Xiang Li | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection | 消除欺骗:采用视觉语言模型进行通用 Deepfake 检测 | Sohail Ahmed Khan, Duc-Tien Dang-Nguyen | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models | RealCompo:现实主义和组合性之间的动态平衡改进了文本到图像的扩散模型 | Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | A Geometric Algorithm for Tubular Shape Reconstruction from Skeletal Representation | 一种从骨骼表示重建管状形状的几何算法 | Guoqing Zhang, Songzi Cat, Juzi Cat | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | Two-stage Rainfall-Forecasting Diffusion Model | 两阶段降雨量预报扩散模型 | XuDong Ling, ChaoRong Li, FengQing Qin, LiHong Zhu, Yuanyuan Huang | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion | MuLan:用于渐进式多对象扩散的多模态 LLM 代理 | Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhou | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction | MVDiffusion++:用于单视图或稀疏视图 3D 对象重建的密集高分辨率多视图扩散模型 | Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation | DiffusionNOCS:管理 Sim2Real 多模态类别级姿势估计中的对称性和不确定性 | Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki | arxiv.org/pdf/2402.12… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples | CounterCurate:通过反事实示例增强物理和语义视觉语言组合推理 | Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | A Touch, Vision, and Language Dataset for Multimodal Alignment | 用于多模式对齐的触摸、视觉和语言数据集 | Letian Fu, Gaurav Datta, Huang Huang, William Chung-Ho Panitch, Jaimyn Drake, Joseph Ortiz, Mustafa Mukadam, Mike Lambeta, Roberto Calandra, Ken Goldberg | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts | 欺骗你的多式联运法学硕士有多容易?欺骗性提示的实证分析 | Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | OLViT:通过基于注意力的嵌入进行视频对话的多模态状态跟踪 | Adnen Abdessaied, Manuel von Hochmeister, Andreas Bulling | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | ConVQG: Contrastive Visual Question Generation with Multimodal Guidance | ConVQG:利用多模态指导生成对比视觉问题 | Li Mi, Syrielle Montariol, Javiera Castillo-Navarro, Xianjie Dai, Antoine Bosselut, Devis Tuia | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Model Composition for Multimodal Large Language Models | 多模态大语言模型的模型组合 | Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, et.al. | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering | 模态感知与大型语言模型的集成,用于基于知识的视觉问答 | Junnan Dong, Qinggang Zhang, Huachi Zhou, Daochen Zha, Pai Zheng, Xiao Huang | arxiv.org/pdf/2402.12… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey | NeRF 和 3D 高斯分布如何重塑 SLAM:一项调查 | Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields | 提高相机位姿和分解低阶张量辐射场联合优化的鲁棒性 | Bo-Yu Cheng, Wei-Chen Chiu, Yu-Lun Liu | arxiv.org/pdf/2402.13… | link |
| 2024-02-20 | OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow | OccFlowNet:通过可微渲染和占用流实现自监督占用估计 | Simon Boeder, Fabian Gigengack, Benjamin Risse | arxiv.org/pdf/2402.12… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | FlashTex: Fast Relightable Mesh Texturing with LightControlNet | FlashTex:使用 LightControlNet 进行快速可重新照明网格纹理 | Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | VideoPrism: A Foundational Visual Encoder for Video Understanding | VideoPrism:用于视频理解的基础视觉编码器 | Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, et.al. | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model | 使用 CoRTe 进行跨域迁移学习:从黑盒到轻量级分割模型的一致且可靠的迁移 | Claudia Cuttano, Antonio Tavera, Fabio Cermelli, Giuseppe Averta, Barbara Caputo | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Improve Cross-Architecture Generalization on Dataset Distillation | 改进数据集蒸馏的跨架构泛化 | Binglin Zhou, Linhao Zhong, Wentao Chen | arxiv.org/pdf/2402.13… | link |
| 2024-02-20 | Efficient Parameter Mining and Freezing for Continual Object Detection | 用于持续目标检测的高效参数挖掘和冻结 | Angelo G. Menezes, Augusto J. Peterlevitz, Mateus A. Chinelatto, André C. P. L. F. de Carvalho | arxiv.org/pdf/2402.12… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | Video ReCap: Recursive Captioning of Hour-Long Videos | 视频回顾:长达一小时的视频的递归字幕 | Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | 使用生物力学模型和综合训练数据从视频进行 3D 运动学估计 | Zhi-Yi Lin, Bofan Lyu, Judith Cueto Fernandez, Eline van der Kruk, Ajay Seth, Xucong Zhang | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Toward Fairness via Maximum Mean Discrepancy Regularization on Logits Space | 通过 Logits 空间上的最大均值差异正则化实现公平 | Hao-Wei Chung, Ching-Hao Chiu, Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition | 用于连续视觉语音识别的传统混合解码器和 CTC/Attention 解码器的比较 | David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | MapTrack: Tracking in the Map | MapTrack:在地图中追踪 | Fei Wang, Ruohui Zhang, Chenglin Chen, Min Yang, Yun Bai | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Cell Graph Transformer for Nuclei Classification | 用于细胞核分类的细胞图转换器 | Wei Lou, Guanbin Li, Xiang Wan, Haofeng Li | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | UniCell: Universal Cell Nucleus Classification via Prompt Learning | UniCell:通过快速学习进行通用细胞核分类 | Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | Advancements in Point Cloud-Based 3D Defect Detection and Classification for Industrial Systems: A Comprehensive Survey | 工业系统基于点云的 3D 缺陷检测和分类的进展:全面调查 | Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | SolarPanel Segmentation :Self-Supervised Learning for Imperfect Datasets | SolarPanel 分割:不完美数据集的自我监督学习 | Sankarshanaa Sagaram, Aditya Kasliwal, Krish Didwania, Laven Srivastava, Pallavi Kailas, Ujjwal Verma | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Radar-Based Recognition of Static Hand Gestures in American Sign Language | 基于雷达的美国手语静态手势识别 | Christian Schuessler, Wenxuan Zhang, Johanna Bräunig, Marcel Hoffmann, Michael Stelzig, Martin Vossiek | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | GOOD: Towards Domain Generalized Orientated Object Detection | 好:迈向领域广义目标检测 | Qi Bi, Beichen Zhou, Jingjun Yi, Wei Ji, Haolan Zhan, Gui-Song Xia | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | BronchoTrack: Airway Lumen Tracking for Branch-Level Bronchoscopic Localization | BronchoTrack:用于分支级支气管镜定位的气道管腔跟踪 | Qingyao Tian, Huai Liao, Xinyan Huang, Bingyu Yang, Jinlin Wu, Jian Chen, Lujie Li, Hongbin Liu | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Fingerprint Presentation Attack Detector Using Global-Local Model | 使用全局局部模型的指纹呈现攻击检测器 | Haozhe Liu, Wentian Zhang, Feng Liu, Haoqian Wu, Linlin Shen | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer Learning | CST:参数和内存高效迁移学习的校准侧调 | Feng Chen | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images | PAC-FNO:用于识别低质量图像的并行结构全分量傅立叶神经算子 | Jinsung Jeon, Hyundong Jin, Jonghyun Choi, Sanghyun Hong, Dongeun Lee, Kookjin Lee, Noseong Park | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition | 学习用于少样本动作识别的域不变时间动力学 | Yuke Li, Guangyi Chen, Ben Abramowitz, Stefano Anzellott, Donglai Wei | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | wmh_seg: Transformer based U-Net for Robust and Automatic White Matter Hyperintensity Segmentation across 1.5T, 3T and 7T | wmh_seg:基于 Transformer 的 U-Net,用于 1.5T、3T 和 7T 的鲁棒和自动白质高信号分割 | Jinghang Li, Tales Santini, Yuanzhe Huang, Joseph M. Mettenburg, Tamer S. Ibrahima, Howard J. Aizensteina, Minjie Wu | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | TorchCP: A Library for Conformal Prediction based on PyTorch | TorchCP:基于 PyTorch 的共形预测库 | Hongxin Wei, Jianguo Huang | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | Object-level Geometric Structure Preserving for Natural Image Stitching | 自然图像拼接的对象级几何结构保留 | Wenxiao Cai, Wankou Yang | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | Neuromorphic Synergy for Video Binarization | 视频二值化的神经形态协同 | Shijie Lin, Xiang Zhang, Lei Yang, Lei Yu, Bin Zhou, Xiaowei Luo, Wenping Wang, Jia Pan | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | YOLO-Ant: A Lightweight Detector via Depthwise Separable Convolutional and Large Kernel Design for Antenna Interference Source Detection | YOLO-Ant:采用深度可分离卷积和大内核设计的轻量级检测器,用于天线干扰源检测 | Xiaoyu Tang, Xingming Chen, Jintao Cheng, Jin Wu, Rui Fan, Chengxi Zhang, Zebo Zhou | arxiv.org/pdf/2402.12… | link |
GNN
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach | 以对象为中心的深度神经网络中的视觉推理:一种比较认知方法 | Guillermo Puebla, Jeffrey S. Bowers | arxiv.org/pdf/2402.12… | link |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | Slot-VLM: SlowFast Slots for Video-Language Modeling | Slot-VLM:用于视频语言建模的 SlowFast 插槽 | Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu | arxiv.org/pdf/2402.13… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing | UniEdit:用于视频运动和外观编辑的统一免调优框架 | Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | PIP-Net: Pedestrian Intention Prediction in the Wild | PIP-Net:野外行人意图预测 | Mohsen Azarmi, Mahdi Rezaei, He Wang, Sebastien Glaser | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic Transformer | RhythmFormer:基于分层时间周期变换器提取 rPPG 信号 | Bochao Zou, Zizheng Guo, Jiansheng Chen, Huimin Ma | arxiv.org/pdf/2402.12… | link |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-20 | VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning | VADv2:通过概率规划实现端到端矢量化自动驾驶 | Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | Design and Flight Demonstration of a Quadrotor for Urban Mapping and Target Tracking Research | 用于城市测绘和目标跟踪研究的四旋翼飞行器的设计和飞行演示 | Collin Hague, Nick Kakavitsas, Jincheng Zhang, Chris Beam, Andrew Willis, Artur Wolek | arxiv.org/pdf/2402.13… | null |
| 2024-02-20 | exploreCOSMOS: Interactive Exploration of Conditional Statistical Shape Models in the Web-Browser | exploreCOSMOS:网络浏览器中条件统计形状模型的交互式探索 | Maximilian Hahn, Bernhard Egger | arxiv.org/pdf/2402.13… | link |
| 2024-02-20 | Mind the Exit Pupil Gap: Revisiting the Intrinsics of a Standard Plenoptic Camera | 注意出瞳间隙:重新审视标准全光相机的本质 | Tim Michels, Daniel Mäckelmann, Reinhard Koch | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | ICON: Improving Inter-Report Consistency of Radiology Report Generation via Lesion-aware Mix-up Augmentation | ICON:通过病变感知混合增强提高放射学报告生成的报告间一致性 | Wenjun Hou, Yi Cheng, Kaishuai Xu, Yan Hu, Wenjie Li, Jiang Liu | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis | 用于在文本到图像合成中生成模型首选提示的用户友好框架 | Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang Zhang | arxiv.org/pdf/2402.12… | link |
| 2024-02-20 | Denoising OCT Images Using Steered Mixture of Experts with Multi-Model Inference | 使用多模型推理专家的引导组合对 OCT 图像进行去噪 | Aytaç Özkan, Elena Stoykova, Thomas Sikora, Violeta Madjarova | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | Advancing Monocular Video-Based Gait Analysis Using Motion Imitation with Physics-Based Simulation | 使用运动模仿和基于物理的模拟推进基于单目视频的步态分析 | Nikolaos Smyrnakis, Tasos Karakostas, R. James Cotton | arxiv.org/pdf/2402.12… | null |
| 2024-02-20 | A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective | 机器学习在数据变化方面的进展的全面回顾:跨领域视角 | Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen | arxiv.org/pdf/2402.12… | null |