[分享][每日更新][2024.02.21][CV_arxiv_papers]

214 阅读9分钟

[UPDATED!] 2024-02-21 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21Geometry-Informed Neural Networks几何信息神经网络Arturs Berzins, Andreas Radler, Sebastian Sanokowski, Sepp Hochreiter, Johannes Brandstetterarxiv.org/pdf/2402.14…null
2024-02-21Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning独特的图像字幕:在 CLIP 引导强化学习中利用真实字幕Antoine Chaffin, Ewa Kijak, Vincent Claveauarxiv.org/pdf/2402.13…link
2024-02-21Tumor segmentation on whole slide images: training or prompting?整个幻灯片图像上的肿瘤分割:训练还是提示?Huaqian Wu, Clara Brémond-Martin, Kévin Bouaou, Cédric Clouchouxarxiv.org/pdf/2402.13…null
2024-02-21NeuralDiffuser: Controllable fMRI Reconstruction with Primary Visual Feature Guided DiffusionNeuralDiffuser:具有主要视觉特征引导扩散的可控 fMRI 重建Haoyu Li, Hao Wu, Badong Chenarxiv.org/pdf/2402.13…null
2024-02-21Scalable Methods for Brick Kiln Detection and Compliance Monitoring from Satellite Imagery: A Deployment Case Study in India利用卫星图像进行砖窑检测和合规性监测的可扩展方法:印度的部署案例研究Rishabh Mondal, Zeel B Patel, Vannsh Jani, Nipun Batraarxiv.org/pdf/2402.13…null
2024-02-21Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completionCas-DiffCom:用于婴儿纵向超分辨率 3D 医学图像补全的级联扩散模型Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, et.al.arxiv.org/pdf/2402.13…null
2024-02-21SRNDiff: Short-term Rainfall Nowcasting with Condition Diffusion ModelSRNDiff:使用条件扩散模型进行短期降雨临近预报Xudong Ling, Chaorong Li, Fengqing Qin, Peng Yang, Yuanyuan Huangarxiv.org/pdf/2402.13…null
2024-02-21Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation具有 2D 三平面和 3D 小波表示的混合视频扩散模型Kihong Kim, Haneol Lee, Jihye Park, Seyeon Kim, Kwanghee Lee, Seungryong Kim, Jaejun Yooarxiv.org/pdf/2402.13…null
2024-02-21Flexible Physical Camouflage Generation Based on a Differential Approach基于差分方法的灵活物理伪装生成Yang Li, Wenyi Tan, Chenxing Zhao, Shuangju Zhou, Xinkai Liang, Quan Panarxiv.org/pdf/2402.13…null
2024-02-21ToDo: Token Downsampling for Efficient Generation of High-Resolution ImagesToDo:令牌下采样以高效生成高分辨率图像Ethan Smith, Nayan Saxena, Aninda Sahaarxiv.org/pdf/2402.13…null
2024-02-21Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models对比提示可改善文本到图像扩散模型中的解缠结Chen Wu, Fernando De la Torrearxiv.org/pdf/2402.13…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21Scene Prior Filtering for Depth Map Super-Resolution用于深度图超分辨率的场景先验过滤Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, Guangwei Gaoarxiv.org/pdf/2402.13…null
2024-02-21VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language ModelsVL-Trojan:针对自回归视觉语言模型的多模式指令后门攻击Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Caoarxiv.org/pdf/2402.13…null
2024-02-21CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language ModelsCODIS:多模态大语言模型的上下文相关视觉理解基准测试Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, et.al.arxiv.org/pdf/2402.13…null
2024-02-21A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation用于电子商务产品描述生成的多模式上下文调整方法Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhangarxiv.org/pdf/2402.13…null
2024-02-21Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement通过部分相关性增强改进视频语料库时刻检索Danyang Hou, Liang Pang, Huawei Shen, Xueqi Chengarxiv.org/pdf/2402.13…null
2024-02-21Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment认知视觉语言映射器:通过增强的视觉知识对齐促进多模式理解Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhangarxiv.org/pdf/2402.13…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting使用聚类识别不必要的 3D 高斯分布以快速渲染 3D 高斯分布Joongho Jo, Hyeongwon Kim, Jongsun Parkarxiv.org/pdf/2402.13…null
2024-02-21SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance FieldsSealD-NeRF:通过神经辐射场对动态场景进行交互式像素级编辑Zhentao Huang, Yukun Shi, Neil Bruce, Minglun Gongarxiv.org/pdf/2402.13…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21SDXL-Lightning: Progressive Adversarial Diffusion DistillationSDXL-Lightning:渐进式对抗扩散蒸馏Shanchuan Lin, Anran Wang, Xiao Yangarxiv.org/pdf/2402.13…null
2024-02-21MSTAR: Multi-Scale Backbone Architecture Search for Timeseries ClassificationMSTAR:用于时间序列分类的多尺度骨干架构搜索Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyenarxiv.org/pdf/2402.13…null
2024-02-21Push Quantization-Aware Training Toward Full Precision Performances via Consistency Regularization通过一致性正则化将量化感知训练推向全精度性能Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu, Ye Taoarxiv.org/pdf/2402.13…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21BEE-NET: A deep neural network to identify in-the-wild Bodily Expression of EmotionsBEE-NET:一种深度神经网络,用于识别野外身体情绪表达Mohammad Mahdi Dehshibi, David Masiparxiv.org/pdf/2402.13…null
2024-02-21BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing ImageryBenchCloudVision:遥感图像云检测和分割深度学习方法的基准分析Loddo Fabio, Dario Piga, Michelucci Umberto, El Ghazouali Safouanearxiv.org/pdf/2402.13…link
2024-02-21Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps零 BEV:任何第一人称模式到 BEV 地图的零镜头投影Gianluca Monaci, Leonid Antsfeld, Boris Chidlovskii, Christian Wolfarxiv.org/pdf/2402.13…null
2024-02-21Weakly supervised localisation of prostate cancer using reinforcement learning for bi-parametric MR images使用双参数 MR 图像的强化学习对前列腺癌进行弱监督定位Martynas Pocius, Wen Yan, Dean C. Barratt, Mark Emberton, Matthew J. Clarkson, Yipeng Hu, Shaheer U. Saeedarxiv.org/pdf/2402.13…null
2024-02-21Mask-up: Investigating Biases in Face Re-identification for Masked Faces蒙面:调查蒙面人脸重新识别中的偏差Siddharth D Jaiswal, Ankit Kr. Verma, Animesh Mukherjeearxiv.org/pdf/2402.13…null
2024-02-21Explainable Classification Techniques for Quantum Dot Device Measurements量子点器件测量的可解释分类技术Daniel Schug, Tyler J. Kovach, M. A. Wolfe, Jared Benson, Sanghyeok Park, J. P. Dodson, J. Corrigan, M. A. Eriksson, Justyna P. Zwolakarxiv.org/pdf/2402.13…null
2024-02-21Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation用于零样本全景和语义分割的可泛化语义视觉查询生成Jialei Chen, Daisuke Deguchi, Chenkai Zhang, Hiroshi Murasearxiv.org/pdf/2402.13…null
2024-02-21Robustness of Deep Neural Networks for Micro-Doppler Radar Classification微多普勒雷达分类深度神经网络的鲁棒性Mikolaj Czerkawski, Carmine Clemente, Craig MichieCraig Michie, Christos Tachtatzisarxiv.org/pdf/2402.13…null
2024-02-21Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition用于场景文本识别的类感知掩模引导特征细化Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Baiarxiv.org/pdf/2402.13…link
2024-02-21Delving into Dark Regions for Robust Shadow Detection深入研究黑暗区域以实现稳健的阴影检测Huankang Guan, Ke Xu, Rynson W. H. Lauarxiv.org/pdf/2402.13…link
2024-02-21YOLOv9: Learning What You Want to Learn Using Programmable Gradient InformationYOLOv9:使用可编程梯度信息学习您想学习的内容Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liaoarxiv.org/pdf/2402.13…link
2024-02-21Learning Pixel-wise Continuous Depth Representation via Clustering for Depth Completion通过深度补全的聚类学习逐像素连续深度表示Chen Shenglun, Zhang Hong, Ma XinZhu, Wang Zhihui, Li Haojiearxiv.org/pdf/2402.13…null
2024-02-21TransGOP: Transformer-Based Gaze Object PredictionTransGOP:基于 Transformer 的注视对象预测Binglu Wang, Chenxi Guo, Yang Jin, Haisheng Xia, Nian Liuarxiv.org/pdf/2402.13…null
2024-02-21A Two-Stage Dual-Path Framework for Text Tampering Detection and Recognition用于文本篡改检测和识别的两阶段双路径框架Guandong Li, Xian Yang, Wenpin Maarxiv.org/pdf/2402.13…null
2024-02-21Unsupervised learning based object detection using Contrastive Learning使用对比学习的基于无监督学习的对象检测Chandan Kumar, Jansel Herrera-Gerena, John Just, Matthew Darr, Ali Jannesariarxiv.org/pdf/2402.13…null

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving基于大语言模型的自动驾驶混合推理Mehdi Azarafza, Mojtaba Nayyeri, Charles Steinmetz, Steffen Staab, Achim Rettbergarxiv.org/pdf/2402.13…null
2024-02-21LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs法学硕士遇见长视频:利用法学硕士中的交互式视觉适配器促进长视频理解Yunxin Li, Xinyu Chen, Baotain Hu, Min Zhangarxiv.org/pdf/2402.13…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21VOOM: Robust Visual Object Odometry and Mapping using Hierarchical LandmarksVOOM:使用分层地标的鲁棒视觉对象里程计和绘图Yutong Wang, Chaoyang Jiang, Xieyuanli Chenarxiv.org/pdf/2402.13…link
2024-02-21Event-aware Video Corpus Moment Retrieval事件感知视频语料库时刻检索Danyang Hou, Liang Pang, Huawei Shen, Xueqi Chengarxiv.org/pdf/2402.13…null
2024-02-21EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera RelocalizationEffLoc:用于高效 6 自由度相机重定位的轻量级视觉转换器Zhendong Xiao, Changhao Chen, Shan Yang, Wu Weiarxiv.org/pdf/2402.13…null
2024-02-21Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images基于多尺度时空变换器的不平衡纵向学习用于不规则时间序列图像的青光眼预测Xikai Yang, Jian Wu, Xi Wang, Yuchen Yuan, Ning Li Wang, Pheng-Ann Hengarxiv.org/pdf/2402.13…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21Real-time 3D-aware Portrait Editing from a Single Image从单个图像进行实时 3D 感知肖像编辑Qingyan Bai, Yinghao Xu, Zifan Shi, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chenarxiv.org/pdf/2402.14…null
2024-02-21A unified framework of non-local parametric methods for image denoising图像去噪非局部参数方法的统一框架Sébastien Herbreteau, Charles Kervrannarxiv.org/pdf/2402.13…null
2024-02-21Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters自带角色:自动生成自定义角色面部动画的整体解决方案Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tianarxiv.org/pdf/2402.13…null
2024-02-21SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised LearningSimPro:实现现实长尾半监督学习的简单概率框架Chaoqun Du, Yizeng Han, Gao Huangarxiv.org/pdf/2402.13…link

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-21Corrective Machine Unlearning纠正机器遗忘Shashwat Goel, Ameya Prabhu, Philip Torr, Ponnurangam Kumaraguru, Amartya Sanyalarxiv.org/pdf/2402.14…null
2024-02-21High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks使用机载全卷积网络进行高通量视觉纳米无人机到纳米无人机的相对定位Luca Crupi, Alessandro Giusti, Daniele Palossiarxiv.org/pdf/2402.13…null
2024-02-21A Unified Framework and Dataset for Assessing Gender Bias in Vision-Language Models用于评估视觉语言模型中性别偏见的统一框架和数据集Ashutosh Sathe, Prachi Jain, Sunayana Sitaramarxiv.org/pdf/2402.13…null
2024-02-21Adversarial Purification and Fine-tuning for Robust UDC Image Restoration用于鲁棒 UDC 图像恢复的对抗性净化和微调Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Luarxiv.org/pdf/2402.13…null
2024-02-21Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel探索每像素微比特语义图像压缩的极限Jordan Dotzel, Bahaa Kotb, James Dotzel, Mohamed Abdelfattah, Zhiru Zhangarxiv.org/pdf/2402.13…null
2024-02-21A Feature Matching Method Based on Multi-Level Refinement Strategy一种基于多级细化策略的特征匹配方法Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Jinlong Yang, Tao Yan, Liangyi Huang, Mingfeng Wangarxiv.org/pdf/2402.13…null