[UPDATED!] 2024-02-06 (Publish Time)
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | EVA-CLIP-18B:将 CLIP 扩展到 180 亿个参数 | Quan Sun, Jinsheng Wang, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models | SHIELD:使用多模态大语言模型进行人脸欺骗和伪造检测的评估基准 | Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan, Changsheng Chen, Zitong Yu, Xiaochun Cao | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation | 无需培训、基于 CLIP 的适应的难以超越的基线 | Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing | 使用空间和通道注意进行多类道路缺陷检测和分割以进行自主道路修复 | Jongmin Yu, Chen Bene Chi, Sebastiano Fichera, Paolo Paoletti, Devansh Mehta, Shan Luo | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models | 连接点:黑盒视觉语言模型的协作微调 | Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation | Polyp-DDPM:基于扩散的语义息肉合成以增强分割 | Zolnamar Dorjsembe, Hsing-Kuo Pao, Furen Xiao | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | YOLOPoint Joint Keypoint and Object Detection | YOLOPoint 联合关键点和物体检测 | Anton Backhaus, Thorsten Luettel, Hans-Joachim Wuensche | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time | 只要有足够的时间,人类就能在识别异常姿势的物体方面击败深度网络 | Netta Ollikka, Amro Abbas, Andrea Perin, Markku Kilpeläinen, Stéphane Deny | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping | 通过变形约束翘曲提高跨模型属的对抗性可迁移性 | Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | A new method for optical steel rope non-destructive damage detection | 一种光学钢丝绳无损损伤检测新方法 | Yunqing Bao, Bin Hu | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | An SVD-free Approach to Nonlinear Dictionary Learning based on RVFL | 基于RVFL的无SVD非线性字典学习方法 | G. Madhuri, Atul Negi | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Face Detection: Present State and Research Directions | 人脸检测:现状和研究方向 | Purnendu Prabhat, Himanshu Gupta, Ajeet Kumar Vishwakarma | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Energy-based Domain-Adaptive Segmentation with Depth Guidance | 具有深度引导的基于能量的域自适应分割 | Jinjing Zhu, Zhedong Hu, Tae-Kyun Kim, Lin Wang | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning | 通过弱监督即时学习探索低资源医学图像分类 | Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection | AttackNet:通过定制的活体检测卷积神经网络架构增强生物识别安全性 | Oleksandr Kuznetsov, Dmytro Zakharov, Emanuele Frontoni, Andrea Maranesi | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction | MoD-SLAM:用于无界 3D 场景重建的单目密集建图 | Heng Zhou, Zhetao Guo, Shuhong Liu, Lechen Zhang, Qihao Wang, Yuxiang Ren, Mingrui Li | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Virtual Classification: Modulating Domain-Specific Knowledge for Multidomain Crowd Counting | 虚拟分类:调整多域人群计数的特定领域知识 | Mingyue Guo, Binghui Chen, Zhaoyi Yan, Yaowei Wang, Qixiang Ye | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images | SISP:全色卫星图像中细粒度船舶实例分割的基准数据集 | Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats | MMAUD:针对现代微型无人机威胁的综合多模式反无人机数据集 | Shenghai Yuan, Yizhuo Yang, Thien Hoang Nguyen, Thien-Minh Nguyen, Jianfei Yang, Fen Liu, Jianping Li, Han Wang, Lihua Xie | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification | SHMC-Net:用于精子头部形态分类的掩模引导特征融合网络 | Nishchal Sapkota, Yejia Zhang, Sirui Li, Peixian Liang, Zhuo Zhao, Danny Z Chen | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | ConUNETR: A Conditional Transformer Network for 3D Micro-CT Embryonic Cartilage Segmentation | ConUNETR:用于 3D Micro-CT 胚胎软骨分割的条件变压器网络 | Nishchal Sapkota, Yejia Zhang, Susan M. Motch Perrine, Yuhan Hsi, Sirui Li, Meng Wu, Greg Holmes, Abdul R. Abdulai, Ethylin W. Jabs, Joan T. Richtsmeier, et.al. | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection | BEAM:用于多视图 3D 物体检测的 Beta 分布射线去噪 | Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model | CAT-SAM:用于分段任意模型的少样本自适应的条件调整网络 | Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Shijian Lu | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning | 利用注入知识的学习提高跨模式的上下文一致性,以实现有效的多模式营销 | Trilok Padhi, Ugur Kursuncu, Yaman Kumar, Valerie L. Shalin, Lane Peterson Fronczek | arxiv.org/pdf/2402.03… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | VRMM: A Volumetric Relightable Morphable Head Model | VRMM:体积可重复照明可变形头部模型 | Haotian Yang, Mingwu Zheng, Chongyang Ma, Yu-Kun Lai, Pengfei Wan, Haibin Huang | arxiv.org/pdf/2402.04… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal | HarmBench:自动化红队和强力拒绝的标准化评估框架 | Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et.al. | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs | 本能偏见:虚假图像导致 MLLM 产生幻觉 | Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback | 使用 AI 反馈的强化学习来调整视频的大型多模态模型 | Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Automatic Robotic Development through Collaborative Framework by Large Language Models | 通过大型语言模型的协作框架进行自动机器人开发 | Zhirong Luan, Yujun Lai | arxiv.org/pdf/2402.03… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | U-shaped Vision Mamba for Single Image Dehazing | 用于单图像去雾的 U 形 Vision Mamba | Zhuoran Zheng, Chen Wu | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning | 用于参数高效微调的低阶注意力侧调 | Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting | 基于扩散的运动行为预测的可控多样化采样 | Yiming Xu, Hao Cheng, Monika Sester | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Intensive Vision-guided Network for Radiology Report Generation | 用于生成放射学报告的强化视觉引导网络 | Fudan Zheng, Mengfei Li, Ying Wang, Weijiang Yu, Ruixuan Wang, Zhiguang Chen, Nong Xiao, Yutong Lu | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images | 在具有最小缩放图像的小数据集上预训练轻量级视觉变压器 | Jen Hong Tan | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification | 基于注意力的形状和步态表示学习,用于基于视频的换衣人员重新识别 | Vuong D. Nguyen, Samiha Mirza, Pranav Mantini, Shishir K. Shah | arxiv.org/pdf/2402.03… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | Instance by Instance: An Iterative Framework for Multi-instance 3D Registration | 逐个实例:多实例 3D 配准的迭代框架 | Xinyue Cao, Xiyu Zhang, Yuxin Cheng, Zhaoshuai Qi, Yanning Zhang, Jiaqi Yang | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | 3D Volumetric Super-Resolution in Radiology Using 3D RRDB-GAN | 使用 3D RRDB-GAN 实现放射学中的 3D 体积超分辨率 | Juhyung Ha, Nian Wang, Surendra Maharjan, Xuhong Zhang | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | EscherNet: A Generative Model for Scalable View Synthesis | EscherNet:可扩展视图合成的生成模型 | Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Belief Scene Graphs: Expanding Partial Scenes with Objects through Computation of Expectation | 信念场景图:通过期望计算用对象扩展部分场景 | Mario A. V. Saucedo, Akash Patel, Akshit Saradagi, Christoforos Kanellakis, George Nikolakopoulos | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving | OASim:基于神经渲染的自动驾驶开放自适应模拟器 | Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, et.al. | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos | Rig3DGS:从休闲单目视频创建可控肖像 | Alfredo Rivero, ShahRukh Athar, Zhixin Shu, Dimitris Samaras | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | 3Doodle: Compact Abstraction of Objects with 3D Strokes | 3Doodle:使用 3D 笔画对对象进行紧凑抽象 | Changwoon Choi, Jaeah Lee, Jaesik Park, Young Min Kim | arxiv.org/pdf/2402.03… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning | OVOR:OnePrompt 具有虚拟离群值正则化功能,可实现免排练的课堂增量学习 | Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Elastic Feature Consolidation for Cold Start Exemplar-free Incremental Learning | 用于冷启动无范例增量学习的弹性特征整合 | Simone Magistri, Tomaso Trinci, Albin Soutif-Cormerais, Joost van de Weijer, Andrew D. Bagdanov | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Deep MSFOP: Multiple Spectral filter Operators Preservation in Deep Functional Maps for Unsupervised Shape Matching | Deep MSFOP:在深度函数图中保留多个光谱滤波器算子以实现无监督形状匹配 | Feifan Luo, Qingsong Li, Ling Hu, Xinru Liu, Haojun Xu, Haibo Wang, Ting Li, Shengjun Liu | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Convincing Rationales for Visual Question Answering Reasoning | 视觉问答推理的令人信服的理由 | Kun Li, George Vosselman, Michael Ying Yang | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models | 视觉超对齐:视觉基础模型的弱到强泛化 | Jianyuan Guo, Hanting Chen, Chengcheng Wang, Kai Han, Chang Xu, Yunhe Wang | arxiv.org/pdf/2402.03… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-06 | CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations | CogCoM:训练大型视觉语言模型,通过操作链深入细节 | Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, et.al. | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions | 针对情境感知交通规则异常的知情强化学习 | Daniel Bogdoll, Jing Qin, Moritz Nekolla, Ahmed Abouelazm, Tim Joseph, J. Marius Zöllner | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction | 深度图像先验分析和利用自引导进行图像重建 | Shijun Liang, Evan Bell, Qing Qu, Rongrong Wang, Saiprasad Ravishankar | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses | DNN 上的隐私泄露:模型反转攻击和防御的调查 | Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia | arxiv.org/pdf/2402.04… | null |
| 2024-02-06 | MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | MobileVLM V2:更快更强的视觉语言模型基线 | Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, et.al. | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | AoSRNet: All-in-One Scene Recovery Networks via Multi-knowledge Integration | AoSRNet:通过多知识集成的多合一场景恢复网络 | Yuxu Lu, Dong Yang, Yuan Gao, Ryan Wen Liu, Jun Liu, Yu Guo | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution | FoolSDEdit:欺骗性地将您的编辑引向有针对性的属性感知分发 | Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning | QuEST:通过高效选择性微调进行低位扩散模型量化 | Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | Reviewing FID and SID Metrics on Generative Adversarial Networks | 审查生成对抗网络的 FID 和 SID 指标 | Ricardo de Deijn, Aishwarya Batra, Brandon Koch, Naseef Mansoor, Hema Makkena | arxiv.org/pdf/2402.03… | null |
| 2024-02-06 | GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation | GRASP:图结构金字塔整体幻灯片图像表示 | Ali Khajegili Mirabadi, Graham Archibald, Amirali Darbandsari, Alberto Contreras-Sanz, Ramin Ebrahim Nakhli, Maryam Asadi, Allen Zhang, C. Blake Gilks, Peter Black, Gang Wang, et.al. | arxiv.org/pdf/2402.03… | null |