[分享][每日更新][2024.02.22][CV_arxiv_papers]

128 阅读12分钟

[UPDATED!] 2024-02-22 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22Cameras as Rays: Pose Estimation via Ray Diffusion相机作为光线:通过光线扩散进行姿势估计Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsianiarxiv.org/pdf/2402.14…null
2024-02-22Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models定制视频:文本到视频扩散模型的一次性运动定制Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastavaarxiv.org/pdf/2402.14…null
2024-02-22Zero-Shot Pediatric Tuberculosis Detection in Chest X-Rays using Self-Supervised Learning使用自我监督学习在胸部 X 射线中进行零次小儿结核病检测Daniel Capellán-Martín, Abhijeet Parida, Juan J. Gómez-Valverde, Ramon Sanchez-Jacob, Pooneh Roshanitabrizi, Marius G. Linguraru, María J. Ledesma-Carbayo, Syed M. Anwararxiv.org/pdf/2402.14…null
2024-02-22Visual Hallucinations of Multi-modal Large Language Models多模态大语言模型的视觉幻觉Wen Huang, Hongbin Liu, Minxin Guo, Neil Zhenqiang Gongarxiv.org/pdf/2402.14…null
2024-02-22Debiasing Text-to-Image Diffusion Models消除文本到图像扩散模型的偏差Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qiarxiv.org/pdf/2402.14…null
2024-02-22Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning通过离散扩散进行大规模无动作视频预训练,以实现高效的策略学习Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Liarxiv.org/pdf/2402.14…null
2024-02-22Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment基于扩散模型的视觉补偿引导和视觉差异分析用于无参考图像质量评估Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gaoarxiv.org/pdf/2402.14…null
2024-02-22Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing渐进残差对齐:GAN 反演和图像属性编辑的双流框架Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Maoarxiv.org/pdf/2402.14…null
2024-02-22Uncertainty-driven and Adversarial Calibration Learning for Epicardial Adipose Tissue Segmentation心外膜脂肪组织分割的不确定性驱动和对抗性校准学习Kai Zhao, Zhiming Liu, Jiaqi Liu, Jingbiao Zhou, Bihong Liao, Huifang Tang, Qiuyu Wang, Chunquan Liarxiv.org/pdf/2402.14…null
2024-02-22Typographic Text Generation with Off-the-Shelf Diffusion Model使用现成的扩散模型生成印刷文本KhayTze Peong, Seiichi Uchida, Daichi Haraguchiarxiv.org/pdf/2402.14…null
2024-02-22Font Style Interpolation with Diffusion Models使用扩散模型进行字体样式插值Tetta Kondo, Shumpei Takezaki, Daichi Haraguchi, Seiichi Uchidaarxiv.org/pdf/2402.14…null
2024-02-22A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation将视觉上下文学习与掩模图像建模相结合以改进超声分割的简单框架Yuyue Zhou, Banafshe Felfeliyan, Shrimanti Ghosh, Jessica Knight, Fatima Alves-Pereira, Christopher Keen, Jessica Küpper, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremkoarxiv.org/pdf/2402.14…null
2024-02-22MVD![^2](): Efficient Multiview 3D Reconstruction for Multiview DiffusionMVD![^2]():用于多视图扩散的高效多视图 3D 重建Xin-Yang Zheng, Hao Pan, Yu-Xiao Guo, Xin Tong, Yang Liuarxiv.org/pdf/2402.14…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22PALO: A Polyglot Large Multimodal Model for 5B PeoplePALO:面向 5B 人群的多语言大型多模式模型Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khanarxiv.org/pdf/2402.14…null
2024-02-22Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset使用 MATH-Vision 数据集测量多模态数学推理Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Mingjie Zhan, Hongsheng Liarxiv.org/pdf/2402.14…null
2024-02-22DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language ModelsDualFocus:在多模态大语言模型中整合宏观和微观视角Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wangarxiv.org/pdf/2402.14…null
2024-02-22RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and SimulationRoboScript:跨真实和模拟的自由形式操作任务的代码生成Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, et.al.arxiv.org/pdf/2402.14…null
2024-02-22Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective少即是多:从 EOS 决策角度减轻多模态幻觉Zihao Yue, Liang Zhang, Qin Jinarxiv.org/pdf/2402.14…null
2024-02-22Uncertainty-Aware Evaluation for Vision-Language Models视觉语言模型的不确定性评估Vasily Kostumov, Bulat Nutfullin, Oleg Pilipenko, Eugene Ilyushinarxiv.org/pdf/2402.14…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22Consolidating Attention Features for Multi-view Image Editing整合多视图图像编辑的注意力特征Or Patashnik, Rinon Gal, Daniel Cohen-Or, Jun-Yan Zhu, Fernando De la Torrearxiv.org/pdf/2402.14…null
2024-02-22FrameNeRF: A Simple and Efficient Framework for Few-shot Novel View SynthesisFrameNeRF:一种简单高效的小样本新颖视图合成框架Yan Xing, Pan Wang, Ligang Liu, Daolun Li, Li Zhangarxiv.org/pdf/2402.14…null
2024-02-22NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D DetectionNeRF-Det++:将语义提示和透视感知深度监督结合起来进行室内多视图 3D 检测Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyangarxiv.org/pdf/2402.14…null
2024-02-22TaylorGrid: Towards Fast and High-Quality Implicit Field Learning via Direct Taylor-based Grid OptimizationTaylorGrid:通过直接基于泰勒的网格优化实现快速、高质量的隐式场学习Renyi Mao, Qingshan Xu, Peng Zheng, Ye Wang, Tieru Wu, Rui Maarxiv.org/pdf/2402.14…null
2024-02-22Mip-Grid: Anti-aliased Grid Representations for Neural Radiance FieldsMip-Grid:神经辐射场的抗锯齿网格表示Seungtae Nam, Daniel Rho, Jong Hwan Ko, Eunbyung Parkarxiv.org/pdf/2402.14…null

3DGS

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22GaussianPro: 3D Gaussian Splatting with Progressive PropagationGaussianPro:具有渐进传播的 3D 高斯泼溅Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chenarxiv.org/pdf/2402.14…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth EstimationTIE-KD:用于单目深度估计的独立于教师且可解释的知识蒸馏Sangwon Choi, Daejune Choi, Duksu Kimarxiv.org/pdf/2402.14…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22WeakSAM: Segment Anything Meets Weakly-supervised Instance-level RecognitionWeakSAM:分割任何东西满足弱监督实例级识别Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wangarxiv.org/pdf/2402.14…null
2024-02-22A Transformer Model for Boundary Detection in Continuous Sign Language连续手语边界检测的 Transformer 模型Razieh Rastgoo, Kourosh Kiani, Sergio Escaleraarxiv.org/pdf/2402.14…null
2024-02-22Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening用于增强宫颈异常筛查的两阶段细胞病理学图像合成Zhenrong Shen, Manman Fei, Xin Wang, Jiangdong Cai, Sheng Wang, Lichi Zhang, Qian Wangarxiv.org/pdf/2402.14…null
2024-02-22QIS : Interactive Segmentation via Quasi-Conformal MappingsQIS:通过准共形映射进行交互式分割Han Zhang, Daoping Zhang, Lok Ming Luiarxiv.org/pdf/2402.14…null
2024-02-22Quadruplet Loss For Improving the Robustness to Face Morphing Attacks四联体损失提高面对变形攻击的鲁棒性Iurii Medvedev, Nuno Gonçalvesarxiv.org/pdf/2402.14…null
2024-02-22Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation克服医学图像分割自监督对比学习中的维度崩溃Jamshid Hassanpour, Vinkle Srivastav, Didier Mutter, Nicolas Padoyarxiv.org/pdf/2402.14…null
2024-02-22High-Speed Detector For Low-Powered Devices In Aerial Grasping适用于空中抓取低功耗设备的高速探测器Ashish Kumar, Laxmidhar Beheraarxiv.org/pdf/2402.14…null
2024-02-22Deep vessel segmentation based on a new combination of vesselness filters基于血管过滤器新组合的深层血管分割Guillaume Garret, Antoine Vacavant, Carole Frindelarxiv.org/pdf/2402.14…null
2024-02-22Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition实现视觉地点识别预训练模型的无缝适应Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, Yaowei Wang, Chun Yuanarxiv.org/pdf/2402.14…link
2024-02-22Reimagining Anomalies: What If Anomalies Were Normal?重新想象异常:如果异常是正常的怎么办?Philipp Liznerski, Saurabh Varshneya, Ece Calikus, Sophie Fellenz, Marius Kloftarxiv.org/pdf/2402.14…null
2024-02-22S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in ORS^2Former-OR:用于 OR 中场景图生成的单级双峰变压器Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Hengarxiv.org/pdf/2402.14…null
2024-02-22Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks使用自适应图卷积网络建模 3D 婴儿动力学Daniel Holmberg, Manu Airaksinen, Viviana Marchi, Andrea Guzzetta, Anna Kivi, Leena Haataja, Sampsa Vanhatalo, Teemu Roosarxiv.org/pdf/2402.14…null
2024-02-22Semantic Image Synthesis with Unconditional Generator使用无条件生成器进行语义图像合成Jungwoo Chae, Hyunin Cho, Sooyeon Go, Kyungmook Choi, Youngjung Uharxiv.org/pdf/2402.14…null
2024-02-22Reading Relevant Feature from Global Representation Memory for Visual Object Tracking从全局表示存储器中读取相关特征以进行视觉对象跟踪Xinyu Zhou, Pinxue Guo, Lingyi Hong, Jinglun Li, Wei Zhang, Weifeng Ge, Wenqiang Zhangarxiv.org/pdf/2402.14…null
2024-02-22GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic ConstraintsGAM-Depth:利用梯度感知掩模和语义约束的自监督室内深度估计Anqi Cheng, Zhiyuan Yang, Haiyue Zhu, Kezhi Maoarxiv.org/pdf/2402.14…null
2024-02-22Subobject-level Image Tokenization子对象级图像标记化Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fungarxiv.org/pdf/2402.14…null
2024-02-22YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5YOLO-TLA:基于YOLOv5的高效轻量小物体检测模型Peng Gao, Chun-Lin Ji, Tao Yu, Ru-Yue Yuanarxiv.org/pdf/2402.14…null
2024-02-22Secure Navigation using Landmark-based Localization in a GPS-denied Environment在 GPS 拒绝的环境中使用基于地标的定位进行安全导航Ganesh Sapkota, Sanjay Madriaarxiv.org/pdf/2402.14…null
2024-02-22A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets自监督压力图人体关键点检测方法:优化跨数据集的泛化和计算效率Chengzhang Yu, Xianjun Yang, Wenxia Bao, Shaonan Wang, Zhiming Yaoarxiv.org/pdf/2402.14…null
2024-02-22Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer使用修补频谱图变压器的压缩鲁棒合成语音检测Amit Kumar Singh Yadav, Ziyue Xiang, Kratika Bhagtani, Paolo Bestagini, Stefano Tubaro, Edward J. Delparxiv.org/pdf/2402.14…null
2024-02-22HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention提示:具有掩模感知编码和增强注意力的高质量 INPainting TransformerShuang Chen, Amir Atapour-Abarghouei, Hubert P. H. Shumarxiv.org/pdf/2402.14…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video SynthesisSnap Video:用于文本到视频合成的缩放时空转换器Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, et.al.arxiv.org/pdf/2402.14…null
2024-02-22Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single ShotMulti-HMR:单次多人全身人体网格恢复Fabien Baradel, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez, Thomas Lucasarxiv.org/pdf/2402.14…null
2024-02-22CCPA: Long-term Person Re-Identification via Contrastive Clothing and Pose AugmentationCCPA:通过对比服装和姿势增强进行长期人员重新识别Vuong D. Nguyen, Shishir K. Shaharxiv.org/pdf/2402.14…null
2024-02-22Learning to Kern -- Set-wise Estimation of Optimal Letter Space学习紧缩——最佳字母空间的集合估计Kei Nakatsuru, Seiichi Uchidaarxiv.org/pdf/2402.14…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22Distributed Radiance Fields for Edge Video Compression and Metaverse Integration in Autonomous Driving用于自动驾驶中边缘视频压缩和元宇宙集成的分布式辐射场Eugen Šlapak, Matúš Dopiriak, Mohammad Abdullah Al Faruque, Juraj Gazda, Marco Levoratoarxiv.org/pdf/2402.14…null
2024-02-22Place Anything into Any Video将任何内容放入任何视频中Ziling Liu, Jinyu Yang, Mingqi Gao, Feng Zhengarxiv.org/pdf/2402.14…null
2024-02-22Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene UnderstandingSwin3D++:用于 3D 室内场景理解的有效多源预训练Yu-Qi Yang, Yu-Xiao Guo, Yang Liuarxiv.org/pdf/2402.14…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22Self-supervised Visualisation of Medical Image Datasets医学图像数据集的自监督可视化Ifeoma Veronica Nwabufo, Jan Niklas Böhm, Philipp Berens, Dmitry Kobakarxiv.org/pdf/2402.14…null
2024-02-22CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning FusionCLCE:一种改进交叉熵和对比学习以实现优化学习融合的方法Zijun Long, George Killick, Lipeng Zhuang, Gerardo Aragon-Camarasa, Zaiqiao Meng, Richard Mccreadiearxiv.org/pdf/2402.14…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-22Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging医学影像中专家级视觉语言基础模型的人口统计学偏差Yuzhe Yang, Yujia Liu, Xin Liu, Avanti Gulhane, Domenico Mastrodicasa, Wei Wu, Edward J Wang, Dushyant W Sahani, Shwetak Patelarxiv.org/pdf/2402.14…null
2024-02-22GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising DiffusionGeneOH Diffusion:通过去噪扩散实现可推广的手-物体交互去噪Xueyi Liu, Li Yiarxiv.org/pdf/2402.14…null
2024-02-22CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous ManipulationCyber​​Demo:增强模拟人体演示以实现现实世界的灵巧操作Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, Xiaolong Wangarxiv.org/pdf/2402.14…null
2024-02-22A Class of Topological Pseudodistances for Fast Comparison of Persistence Diagrams一类用于快速比较持久图的拓扑伪距离Rolando Kindelan Nuñez, Mircea Petrache, Mauricio Cerda, Nancy Hitschfeldarxiv.org/pdf/2402.14…null
2024-02-22VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision TuningVLPose:通过语言视觉调整弥合姿势估计中的领域差距Jingyao Li, Pengguang Chen, Xuan Ju, Hong Xu, Jiaya Jiaarxiv.org/pdf/2402.14…null
2024-02-22HR-APR: APR-agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera RelocalisationHR-APR:具有不确定性估计和相机重定位分层细化的 APR 不可知框架Changkun Liu, Shuai Chen, Yukun Zhao, Huajian Huang, Victor Prisacariu, Tristan Braudarxiv.org/pdf/2402.14…null
2024-02-22An Error-Matching Exclusion Method for Accelerating Visual SLAM一种加速视觉SLAM的错误匹配排除方法Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Jinlong Yang, Tao Yan, Liangyi Huang, Mingfeng Wangarxiv.org/pdf/2402.14…null
2024-02-22Vision-Language Navigation with Embodied Intelligence: A Survey具身智能的视觉语言导航:一项调查Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuanarxiv.org/pdf/2402.14…null
2024-02-22A Landmark-Aware Visual Navigation Dataset地标感知视觉导航数据集Faith Johnson, Bryan Bo Cao, Kristin Dana, Shubham Jain, Ashwin Ashokarxiv.org/pdf/2402.14…null
2024-02-22Reconstruction-Based Anomaly Localization via Knowledge-Informed Self-Training通过基于知识的自我训练进行基于重建的异常定位Cheng Qian, Xiaoxian Lao, Chunguang Liarxiv.org/pdf/2402.14…null