[分享][每日更新][2024.02.05][CV_arxiv_papers]

173 阅读12分钟

[UPDATED!] 2024-02-05 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?扩散模型是否能够学习语义上有意义且有效的表示?Qiyao Liang, Ziming Liu, Ila Fietearxiv.org/pdf/2402.03…null
2024-02-05GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language ModelsGUARD:通过角色扮演生成自然语言越狱,以测试大型语言模型的准则遵守情况Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wangarxiv.org/pdf/2402.03…null
2024-02-05Zero-shot Object-Level OOD Detection with Context-Aware Inpainting具有上下文感知修复功能的零样本对象级 OOD 检测Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Dung D. Learxiv.org/pdf/2402.03…null
2024-02-05InstanceDiffusion: Instance-level Control for Image GenerationInstanceDiffusion:图像生成的实例级控制Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misraarxiv.org/pdf/2402.03…null
2024-02-05IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR imagesIGUANe:用于大脑 MR 图像多中心协调的 3D 通用 CycleGANVincent Roca, Grégory Kuchcinski, Jean-Pierre Pruvo, Dorian Manouvriez, Renaud Lopesarxiv.org/pdf/2402.03…null
2024-02-05Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?有机还是扩散:我们可以区分人类艺术和人工智能生成的图像吗?Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, Ben Y. Zhaoarxiv.org/pdf/2402.03…null
2024-02-05Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object MotionDirect-a-Video:通过用户控制的摄像机移动和对象运动生成定制视频Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liaoarxiv.org/pdf/2402.03…null
2024-02-05Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics超越对抗性扰动:具有合法语义的多方面辅助对抗性示例Shuai Li, Xiaoyu Jiang, Xiaoguang Maarxiv.org/pdf/2402.03…null
2024-02-05Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing视觉文本满足低级视觉:视觉文本处理的综合调查Yan Shu, Weichao Zeng, Zhenhang Li, Fangmin Zhao, Yu Zhouarxiv.org/pdf/2402.03…null
2024-02-05PFDM: Parser-Free Virtual Try-on via Diffusion ModelPFDM:通过扩散模型进行无解析器虚拟试戴Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, Jinqiao Wangarxiv.org/pdf/2402.03…null
2024-02-05InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal InstructionsInteractiveVideo:具有协同多模式指令的以用户为中心的可控视频生成Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yuearxiv.org/pdf/2402.03…null
2024-02-05Retrieval-Augmented Score Distillation for Text-to-3D Generation用于文本转 3D 生成的检索增强分数蒸馏Junyoung Seo, Susung Hong, Wooseok Jang, Inès Hyeonsu Kim, Minseop Kwak, Doyup Lee, Seungryong Kimarxiv.org/pdf/2402.02…null
2024-02-05Instance Segmentation XXL-CT Challenge of a Historic Airplane历史飞机的实例分割 XXL-CT 挑战Roland Gruber, Johann Christopher Engster, Markus Michen, Nele Blum, Maik Stille, Stefan Gerth, Thomas Wittenbergarxiv.org/pdf/2402.02…null
2024-02-05ViewFusion: Learning Composable Diffusion Models for Novel View SynthesisViewFusion:学习用于新视图合成的可组合扩散模型Bernard Spiegl, Andrea Perin, Stéphane Deny, Alexander Ilinarxiv.org/pdf/2402.02…null
2024-02-05SynthVision - Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image dataSynthVision - 使用合成图像数据在计算机视觉模型中利用最小输入获得最大输出Yudara Kularathne, Prathapa Janitha, Sithira Ambepitiya, Thanveer Ahamed, Dinuka Wijesundara, Prarththanan Sothyrajaharxiv.org/pdf/2402.02…null
2024-02-05Extreme Two-View Geometry From Object Poses with Diffusion Models具有扩散模型的物体姿势的极端二视图几何Yujing Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiuarxiv.org/pdf/2402.02…null
2024-02-05DisDet: Exploring Detectability of Backdoor Attack on Diffusion ModelsDisDet:探索扩散模型后门攻击的可检测性Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuanarxiv.org/pdf/2402.02…null
2024-02-05InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging DataInVA:用于协调多模态神经影像数据的综合变分自动编码器Bowen Lei, Rajarshi Guhaniyogi, Krishnendu Chandra, Aaron Scheffler, Bani Mallickarxiv.org/pdf/2402.02…null
2024-02-05Fast and Accurate Cooperative Radio Map Estimation Enabled by GANGAN 支持快速准确的协作无线电地图估计Zezhong Zhang, Guangxu Zhu, Junting Chen, Shuguang Cuiarxiv.org/pdf/2402.02…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor FusionAONeuS:声光传感器融合的神经渲染框架Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzlerarxiv.org/pdf/2402.03…null
2024-02-05ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object DetectionActiveAnno3D - 用于多模态 3D 对象检测的主动学习框架Ahmed Ghita, Bjørk Antoniussen, Walter Zimmer, Ross Greer, Christian Creß, Andreas Møgelmose, Mohan M. Trivedi, Alois C. Knollarxiv.org/pdf/2402.03…null
2024-02-05Multi: Multimodal Understanding Leaderboard with Text and Images多:带有文本和图像的多模式理解排行榜Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, et.al.arxiv.org/pdf/2402.03…null
2024-02-05Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationVideo-LaVIT:具有解耦视觉运动标记化的统一视频语言预训练Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, et.al.arxiv.org/pdf/2402.03…null
2024-02-05Text-Guided Image Clustering文本引导图像聚类Andreas Stephan, Lukas Miklautz, Kevin Sidak, Jan Philip Wahle, Bela Gipp, Claudia Plant, Benjamin Rotharxiv.org/pdf/2402.02…null
2024-02-05Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives深入研究道路场景理解的多模态多任务基础模型:从学习范式的角度Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, et.al.arxiv.org/pdf/2402.02…null

3DGS

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-054D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes4D 高斯泼溅:实现动态场景的高效新颖视图合成Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chenarxiv.org/pdf/2402.03…null
2024-02-05SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAMSGS-SLAM:神经密集 SLAM 的语义高斯泼溅Mingrui Li, Shuhong Liu, Heng Zhouarxiv.org/pdf/2402.03…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action RecognitionFROSTER:Frozen CLIP 是开放词汇动作识别的强大老师Xiaohu Huang, Hao Zhou, Kun Yao, Kai Hanarxiv.org/pdf/2402.03…null
2024-02-05Good Teachers Explain: Explanation-Enhanced Knowledge Distillation好老师讲解:讲解增强知识蒸馏Amin Parchami-Araghi, Moritz Böhle, Sukrut Rao, Bernt Schielearxiv.org/pdf/2402.03…link

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05HASSOD: Hierarchical Adaptive Self-Supervised Object DetectionHASSOD:分层自适应自监督目标检测Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wangarxiv.org/pdf/2402.03…null
2024-02-05Swin-UMamba: Mamba-based UNet with ImageNet-based pretrainingSwin-UMamba:基于 Mamba 的 UNet 和基于 ImageNet 的预训练Jiarun Liu, Hao Yang, Hong-Yu Zhou, Yan Xi, Lequan Yu, Yizhou Yu, Yong Liang, Guangming Shi, Shaoting Zhang, Hairong Zheng, et.al.arxiv.org/pdf/2402.03…link
2024-02-05CT-based Anatomical Segmentation for Thoracic Surgical Planning: A Benchmark Study for 3D U-shaped Deep Learning Models基于 CT 的胸部手术规划解剖分割:3D U 形深度学习模型的基准研究Arash Harirpoush, Amirhossein Rasoulian, Marta Kersten-Oertel, Yiming Xiaoarxiv.org/pdf/2402.03…link
2024-02-05Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms通过以凝视为中心的损失项来减轻面部交换中的不可思议(眼睛)Ethan Wilson, Frederick Shic, Sophie Jörg, Eakta Jainarxiv.org/pdf/2402.03…null
2024-02-05RRWNet: Recursive Refinement Network for Effective Retinal Artery/Vein Segmentation and ClassificationRRWNet:用于有效视网膜动脉/静脉分割和分类的递归细化网络José Morano, Guilherme Aresta, Hrvoje Bogunovićarxiv.org/pdf/2402.03…null
2024-02-05Towards Eliminating Hard Label Constraints in Gradient Inversion Attacks消除梯度反转攻击中的硬标签约束Yanbo Wang, Jian Liang, Ran Hearxiv.org/pdf/2402.03…link
2024-02-05Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector通过增强型开放集对象检测器进行跨域少样本对象检测Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Lingjie Kong, Yanwei Fu, Luc Van Gool, et.al.arxiv.org/pdf/2402.03…null
2024-02-05Taylor Videos for Action Recognition用于动作识别的泰勒视频Lei Wang, Xiuyuan Yuan, Tom Gedeon, Liang Zhengarxiv.org/pdf/2402.03…null
2024-02-05[Citation needed] Data usage and citation practices in medical imaging conferences[需要引用]医学影像会议中的数据使用和引用实践Théo Sourget, Ahmet Akkoç, Stinna Winther, Christine Lyngbye Galsgaard, Amelia Jiménez-Sánchez, Dovile Juodelyte, Caroline Petitjean, Veronika Cheplyginaarxiv.org/pdf/2402.03…link
2024-02-05A Safety-Adapted Loss for Pedestrian Detection in Automated Driving自动驾驶中行人检测的安全自适应损失Maria Lyssenko, Piyush Pimplikar, Maarten Bieshaar, Farzad Nozarian, Rudolph Triebelarxiv.org/pdf/2402.02…null
2024-02-05Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing用于道路场景解析的高分辨率无人机图像的无监督语义分割Zihan Ma, Yongshang Li, Ronggui Ma, Chen Liangarxiv.org/pdf/2402.02…null
2024-02-05One-class anomaly detection through color-to-thermal AI for building envelope inspection通过颜色到热人工智能进行一级异常检测,用于建筑围护结构检查Polina Kurtser, Kailun Feng, Thomas Olofsson, Aitor De Andresarxiv.org/pdf/2402.02…null
2024-02-05HoughToRadon Transform: New Neural Network Layer for Features Improvement in Projection SpaceHoughToRadon 变换:用于投影空间特征改进的新神经网络层Alexandra Zhabitskaya, Alexander Sheshkus, Vladimir L. Arlazarovarxiv.org/pdf/2402.02…null
2024-02-05Time-, Memory- and Parameter-Efficient Visual Adaptation时间、内存和参数高效的视觉适应Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnabarxiv.org/pdf/2402.02…null
2024-02-05Multi-scale fMRI time series analysis for understanding neurodegeneration in MCI多尺度功能磁共振成像时间序列分析用于了解 MCI 中的神经退行性变Ammu R., Debanjali Bhattacharya, Ameiy Acharya, Ninad Aithal, Neelam Sinhaarxiv.org/pdf/2402.02…null
2024-02-05Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects用于表面缺陷显着性检测的联合注意力引导特征融合网络Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xuarxiv.org/pdf/2402.02…null
2024-02-05Transmission Line Detection Based on Improved Hough Transform基于改进Hough变换的输电线路检测Wei Song, Pei Li, Man Wangarxiv.org/pdf/2402.02…null
2024-02-05Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective从融合策略的角度提高激光雷达-相机融合模型对抗天气腐蚀的鲁棒性Yihao Huang, Kaiyuan Yu, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Tianlin Li, Geguang Pu, Yang Liuarxiv.org/pdf/2402.02…null
2024-02-05FDNet: Frequency Domain Denoising Network For Cell Segmentation in Astrocytes Derived From Induced Pluripotent Stem CellsFDNet:用于诱导多能干细胞衍生的星形胶质细胞分割的频域去噪网络Haoran Li, Jiahua Shi, Huaming Chen, Bo Du, Simon Maksour, Gabrielle Phillips, Mirella Dottori, Jun Shenarxiv.org/pdf/2402.02…null
2024-02-05Image-Caption Encoding for Improving Zero-Shot Generalization用于改进零样本泛化的图像标题编码Eric Yang Yu, Christopher Liao, Sathvik Ravi, Theodoros Tsiligkaridis, Brian Kulisarxiv.org/pdf/2402.02…link
2024-02-05Learning with Mixture of Prototypes for Out-of-Distribution Detection混合原型学习以进行分布外检测Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, Kristen Moorearxiv.org/pdf/2402.02…null
2024-02-05Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation用于医学图像分割的具有自适应深度监督的密集解码网络Suraj Mishraarxiv.org/pdf/2402.02…null

图像理解

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05CLIP Can Understand DepthCLIP 可以理解深度Dunam Kim, Seokju Leearxiv.org/pdf/2402.03…null
2024-02-05Perceptual Learned Image Compression via End-to-End JND-Based Optimization通过基于 JND 的端到端优化进行感知学习图像压缩Farhad Pakdaman, Sanaz Nami, Moncef Gabboujarxiv.org/pdf/2402.02…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05Training-Free Consistent Text-to-Image Generation免训练一致的文本到图像生成Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmonarxiv.org/pdf/2402.03…null
2024-02-05AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution ImageAdaTreeFormer:从单个高分辨率图像进行树木计数的少量镜头域适应Hamed Amini Amirkolaee, Miaojing Shi, Lianghua He, Mark Mulliganarxiv.org/pdf/2402.02…null
2024-02-05Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey探索计算机视觉混合 CNN 和 ViT 架构的协同作用:一项调查Haruna Yunusa, Shiyin Qin, Abdulrahman Hamman Adama Chukkol, Abdulganiyu Abdu Yusuf, Isah Bello, Adamu Lawanarxiv.org/pdf/2402.02…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05GPU-Accelerated 3D Polygon Visibility Volumes for Synergistic Perception and NavigationGPU 加速的 3D 多边形可见体积可实现协同感知和导航Andrew Willis, Collin Hague, Artur Wolek, Kevin Brinkarxiv.org/pdf/2402.03…null
2024-02-05AI-Enhanced Virtual Reality in Medicine: A Comprehensive Survey人工智能增强虚拟现实在医学中的应用:综合调查Yixuan Wu, Kaiyuan Hu, Danny Z. Chen, Jian Wuarxiv.org/pdf/2402.03…null
2024-02-05Motion-Aware Video Frame Interpolation运动感知视频帧插值Pengfei Han, Fuhua Zhang, Bin Zhao, Xuelong Liarxiv.org/pdf/2402.02…null
2024-02-05ToonAging: Face Re-Aging upon Artistic Portrait Style TransferToonAging:艺术肖像风格迁移下的面部再老化Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seoarxiv.org/pdf/2402.02…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05Representation Surgery for Multi-Task Model Merging多任务模型合并的表示手术Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, Dacheng Taoarxiv.org/pdf/2402.02…link

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-05Test-Time Adaptation for Depth Completion深度完成的测试时间调整Hyoungseob Park, Anjali Gupta, Alex Wongarxiv.org/pdf/2402.03…null
2024-02-05V-IRL: Grounding Virtual Intelligence in Real LifeV-IRL:将虚拟智能融入现实生活Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xiearxiv.org/pdf/2402.03…null
2024-02-05Towards a Flexible Scale-out Framework for Efficient Visual Data Query Processing面向高效可视数据查询处理的灵活横向扩展框架Rohit Verma, Arun Raghunatharxiv.org/pdf/2402.03…null
2024-02-05Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss使用门控卷积和上下文重建损失的全景图像修复Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabboujarxiv.org/pdf/2402.02…null
2024-02-05Pixel-Wise Color Constancy via Smoothness Techniques in Multi-Illuminant Scenes通过多光源场景中的平滑技术实现逐像素颜色恒定Umut Cem Entok, Firas Laakom, Farhad Pakdaman, Moncef Gabboujarxiv.org/pdf/2402.02…null
2024-02-05Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding探索通用音频理解的联合自监督学习Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shenarxiv.org/pdf/2402.02…null
2024-02-05Time-Distributed Backdoor Attacks on Federated Spiking Learning对联邦尖峰学习的时间分布式后门攻击Gorka Abad, Stjepan Picek, Aitor Urbietaarxiv.org/pdf/2402.02…null
2024-02-05Enhancing Compositional Generalization via Compositional Feature Alignment通过组合特征对齐增强组合泛化Haoxiang Wang, Haozhe Si, Huajie Shao, Han Zhaoarxiv.org/pdf/2402.02…link
2024-02-05Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes使用运动提示来监督低数据状态下的单帧身体姿势和形状估计Andrey Davydov, Alexey Sidnev, Artsiom Sanakoyeu, Yuhua Chen, Mathieu Salzmann, Pascal Fuaarxiv.org/pdf/2402.02…null