[UPDATED!] 2024-03-07 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM | SnapNTell:通过检索增强多模态法学硕士增强以实体为中心的视觉问答 | Jielin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moon | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | ObjectCompose:评估基于视觉的模型对对象到背景成分变化的弹性 | Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Delving into the Trajectory Long-tail Distribution for Muti-object Tracking | 深入研究多目标跟踪的轨迹长尾分布 | Sijia Chen, En Yu, Jinyang Li, Wenbing Tao | arxiv.org/pdf/2403.04… | link |
| 2024-03-07 | PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | PixArt-Σ:用于 4K 文本到图像生成的扩散变压器的弱到强训练 | Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Pix2Gif: Motion-Guided Diffusion for GIF Generation | Pix2Gif:用于 GIF 生成的运动引导扩散 | Hitesh Kandala, Jianfeng Gao, Jianwei Yang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | A Domain Translation Framework with an Adversarial Denoising Diffusion Model to Generate Synthetic Datasets of Echocardiography Images | 具有对抗性去噪扩散模型的域翻译框架,用于生成超声心动图图像的合成数据集 | Cristiana Tiago, Sten Roar Snare, Jurica Sprem, Kristin McLeod | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | 具有分层空间和时间降噪器的基于解缠扩散的 3D 人体姿势估计 | Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | StableDrag: Stable Dragging for Point-based Image Editing | StableDrag:基于点的图像编辑的稳定拖动 | Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Video-Driven Animation of Neural Head Avatars | 神经头头像的视频驱动动画 | Wolfgang Paier, Paul Hinzer, Anna Hilsmann, Peter Eisert | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Discriminative Probing and Tuning for Text-to-Image Generation | 文本到图像生成的判别性探测和调整 | Leigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chua | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant | MedM2G:通过交叉引导扩散与视觉不变性统一医学多模态生成 | Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Controllable Generation with Text-to-Image Diffusion Models: A Survey | 使用文本到图像扩散模型的可控生成:调查 | Pu Cao, Feng Zhou, Qing Song, Lu Yang | arxiv.org/pdf/2403.04… | link |
| 2024-03-07 | 3DTextureTransformer: Geometry Aware Texture Generation for Arbitrary Mesh Topology | 3DTextureTransformer:任意网格拓扑的几何感知纹理生成 | Dharma KC, Clayton T. Morrison | arxiv.org/pdf/2403.04… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios | CAT:增强多模态大语言模型以回答动态视听场景中的问题 | Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder | MedFLIP:使用屏蔽自动编码器进行医学视觉和语言自监督快速预训练 | Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chen | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document | TextMonkey:用于理解文档的无 OCR 大型多模态模型 | Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Bai | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Effectiveness Assessment of Recent Large Vision-Language Models | 最新大型视觉语言模型的有效性评估 | Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-Ping Fan, Fahad Shahbaz Khan | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | 视听语音识别中丢失视频帧鲁棒性的丢失引起的模态偏差研究 | Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | DNAct: Diffusion Guided Multi-Task 3D Policy Learning | DNAct:扩散引导的多任务 3D 策略学习 | Ge Yan, Yueh-Hua Wu, Xiaolong Wang | arxiv.org/pdf/2403.04… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | Finding Waldo: Towards Efficient Exploration of NeRF Scene Space | 寻找 Waldo:迈向 NeRF 场景空间的高效探索 | Evangelos Skartados, Mehmet Kerim Yucel, Bruno Manganelli, Anastasios Drosou, Albert Saà-Garriga | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis | 用于高效 X 射线新颖视图合成的辐射高斯溅射 | Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs | 使用对象可组合 NeRF 缩小视觉模拟与真实的差距 | Nikhil Mishra, Maximilian Sieb, Pieter Abbeel, Xi Chen | arxiv.org/pdf/2403.04… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | SWAP-NAS: Sample-Wise Activation Patterns For Ultra-Fast NAS | SWAP-NAS:超高速 NAS 的采样激活模式 | Yameng Peng, Andy Song, Haytham M. Fayek, Vic Ciesielski, Xiaojun Chang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection | MAP:无源模型知识产权保护的掩模修剪 | Boyang Peng, Sanqing Qu, Yong Wu, Tianpei Zou, Lianghua He, Alois Knoll, Guang Chen, changjun jiang | arxiv.org/pdf/2403.04… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | 这就是我的观点:用于大规模室外定位的紧凑型以对象为中心的 LiDAR 姿态估计 | Georgi Pramatarov, Matthew Gadd, Paul Newman, Daniele De Martini | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors | AUFormer:视觉变压器是参数高效的面部动作单元检测器 | Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu Yang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes | 走出房间:针对复杂场景推广基于事件的动态运动分割 | Stamatios Georgoulis, Weining Ren, Alfredo Bochicchio, Daniel Eckert, Yuanyou Li, Abel Gawel | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Reducing self-supervised learning complexity improves weakly-supervised classification performance in computational pathology | 降低自监督学习复杂性可提高计算病理学中的弱监督分类性能 | Tim Lenz, Omar S. M. El Nahhas, Marta Ligero, Jakob Nikolas Kather | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Explainable Face Verification via Feature-Guided Gradient Backpropagation | 通过特征引导梯度反向传播进行可解释的人脸验证 | Yuhang Lu, Zewei Xu, Touradj Ebrahimi | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers | T-TAME:用于解释卷积网络和视觉变压器的可训练注意力机制 | Mariano V. Ntrougkas, Nikolaos Gkalelis, Vasileios Mezaris | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning | 用于跨域少样本学习的判别性样本引导和参数高效的特征空间适应 | Rashindrie Perera, Saman Halgamuge | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging | 源很重要:源数据集对医学成像模型稳健性的影响 | Dovile Juodelyte, Yucheng Lu, Amelia Jiménez-Sánchez, Sabrina Bottazzi, Enzo Ferrante, Veronika Cheplygina | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Improved Focus on Hard Samples for Lung Nodule Detection | 提高对肺结节检测中硬样本的关注 | Yujiang Chen, Mei Xie | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | FriendNet: Detection-Friendly Dehazing Network | FriendNet:检测友好的去雾网络 | Yihua Fan, Yongzhen Wang, Mingqiang Wei, Fu Lee Wang, Haoran Xie | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning | 深度学习中颜色和纹理失真对地球观测数据的影响 | Martin Willbo, Aleksis Pirinen, John Martinsson, Edvin Listo Zec, Olof Mogren, Mikael Nilsson | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Learning to Remove Wrinkled Transparent Film with Polarized Prior | 学习用偏光先验去除起皱的透明薄膜 | Jiaqi Tang, Ruizheng Wu, Xiaogang Xu, Sixing Hu, Ying-Cong Chen | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-Dimensional Images for EEG Classification | 以二维图像表示的适当拓扑图上的时空池用于脑电图分类 | Takuto Fukushima, Ryusuke Miyamoto | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection | AO-DETR:用于 X 射线违禁物品检测的防重叠 DETR | Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Shuyang Lin, Da Cai, Dongyue Chen | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | A![^{3}]()lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP | A![^{3}]()lign-DFER:开创性的全面动态情感对齐,用于使用 CLIP 进行动态面部表情识别 | Zeng Tao, Yan Wang, Junxiong Lin, Haoran Wang, Xinji Mai, Jiawen Yu, Xuan Tong, Ziheng Zhou, Shaoqi Yan, Qing Zhao, et.al. | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Active Generalized Category Discovery | 主动广义类别发现 | Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, Cheng-Lin Liu | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Depth-aware Test-Time Training for Zero-shot Video Object Segmentation | 零镜头视频对象分割的深度感知测试时间训练 | Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cun | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | ACC-ViT : Atrous Convolution's Comeback in Vision Transformers | ACC-ViT:Atrous Convolution 在 Vision Transformers 中回归 | Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images | CN-RMA:结合网络与光线行进聚合进行多视图图像的 3D 室内物体检测 | Guanlin Shen, Jingwei Huang, Zhihua Hu, Bin Wang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising | SAM-PD:SAM 在通过提示去噪跟踪和分割视频中的任何内容方面可以帮助我们走多远 | Tao Zhou, Wenhan Luo, Qi Ye, Zhiguo Shi, Jiming Chen | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Image Coding for Machines with Edge Information Learning Using Segment Anything | 使用 Segment Anything 进行边缘信息学习的机器图像编码 | Takahiro Shindo, Kein Yamada, Taiju Watanabe, Hiroshi Watanabe | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization | SDPL:无人机视图地理定位的平移密集分区学习 | Quan Chen, Tingyu Wang, Zihao Yang, Haoran Li, Rongfeng Lu, Yaoqi Sun, Bolun Zheng, Chenggang Yan | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | ProMISe: Promptable Medical Image Segmentation using SAM | ProMISe:使用 SAM 进行快速医学图像分割 | Jinfeng Wang, Sifan Song, Xinkun Wang, Yiyi Wang, Yiyi Miao, Jionglong Su, S. Kevin Zhou | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Dual-path Frequency Discriminators for Few-shot Anomaly Detection | 用于小样本异常检测的双路径鉴频器 | Yuhu Bai, Jiangning Zhang, Yuhang Dong, Guanzhong Tian, Yunkang Cao, Yabiao Wang, Chengjie Wang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | An Explainable AI Framework for Artificial Intelligence of Medical Things | 医疗人工智能的可解释人工智能框架 | Al Amin, Kamrul Hasan, Saleh Zein-Sabatto, Deo Chimba, Imtiaz Ahmed, Tariqul Islam | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models | 可扩展且鲁棒的 Transformer 解码器,用于使用基础模型进行可解释的图像分类 | Evelyn Mannix, Howard Bondell | arxiv.org/pdf/2403.04… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | How Far Are We from Intelligent Visual Deductive Reasoning? | 我们离智能视觉演绎推理还有多远? | Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly | arxiv.org/pdf/2403.04… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed | 高效的 LoFTR:具有稀疏速度的半密集局部特征匹配 | Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, Xiaowei Zhou | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Masked Capsule Autoencoders | 屏蔽胶囊自动编码器 | Miles Everett, Mingjun Zhong, Georgios Leontidis | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level | 更快的邻域注意力:在线程块级别降低自注意力的 O(n^2) 成本 | Ali Hassani, Wen-Mei Hwu, Humphrey Shi | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Dynamic Cross Attention for Audio-Visual Person Verification | 用于视听人员验证的动态交叉注意力 | R. Gnana Praveen, Jahangir Alam | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention | 基于联合交叉注意力递归融合的视听行人验证 | R. Gnana Praveen, Jahangir Alam | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Multi-step Temporal Modeling for UAV Tracking | 无人机跟踪的多步时间建模 | Xiaoying Yuan, Tingfa Xu, Xincong Liu, Ying Wang, Haolin Qin, Yuqiang Fang, Jianan Li | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking | LORS:用于参数高效网络堆叠的低秩残差结构 | Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, Yong Liu, Chengjie Wang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | A data-centric approach to class-specific bias in image data augmentation | 一种以数据为中心的方法来解决图像数据增强中特定类别的偏差 | Athanasios Angelakis, Andrey Rass | arxiv.org/pdf/2403.04… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | Unbiased Estimator for Distorted Conics in Camera Calibration | 相机校准中畸变二次曲线的无偏估计器 | Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon, Jongwoo Lim, Ayoung Kim | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | 用于自我中心 3D 手势估计的单视图到双视图自适应 | Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato | arxiv.org/pdf/2403.04… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-07 | I Can't Believe It's Not Scene Flow! | 我不敢相信这不是场景流! | Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Hays | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Embodied Understanding of Driving Scenarios | 对驾驶场景的具体理解 | Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Hyperspectral unmixing for Raman spectroscopy via physics-constrained autoencoders | 通过物理约束自动编码器对拉曼光谱进行高光谱分解 | Dimitar Georgiev, Álvaro Fernández-Galiana, Simon Vilms Pedersen, Georgios Papadopoulos, Ruoxiao Xie, Molly M. Stevens, Mauricio Barahona | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment | MAGR:用于持续行动质量评估的流形对齐图正则化 | Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liang | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging | 用于多重曝光 HDR 成像的单图像 HDR 重建辅助重影抑制和细节保留网络 | Huafeng Li, Zhenmei Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | YYDS: Visible-Infrared Person Re-Identification with Coarse Descriptions | YYDS:具有粗略描述的可见红外人员重新识别 | Yunhao Du, Zhicheng Zhao, Fei Su | arxiv.org/pdf/2403.04… | null |
| 2024-03-07 | Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving | 迈向基于学习的规划:现实世界自动驾驶的 nuPlan 基准 | Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, et.al. | arxiv.org/pdf/2403.04… | null |