[分享][每日更新][2024.02.20][CV_arxiv_papers]

228 阅读11分钟

[UPDATED!] 2024-02-20 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech TechnologiesAnnoTheia:用于视听语音技术的半自动注释工具包José-M. Acosta-Triana, David Gimeno-Gómez, Carlos-D. Martínez-Hinarejosarxiv.org/pdf/2402.13…link
2024-02-20Neural Network Diffusion神经网络扩散Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang Youarxiv.org/pdf/2402.13…link
2024-02-20VGMShield: Mitigating Misuse of Video Generative ModelsVGMShield:减少视频生成模型的滥用Yan Pang, Yang Zhang, Tianhao Wangarxiv.org/pdf/2402.13…null
2024-02-20Visual Style Prompting with Swapping Self-Attention通过交换自我注意力来提示视觉风格Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uharxiv.org/pdf/2402.12…null
2024-02-20A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence模式分析与机器智能文献综述Penghai Zhao, Xin Zhang, Ming-Ming Cheng, Jian Yang, Xiang Liarxiv.org/pdf/2402.12…null
2024-02-20CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection消除欺骗:采用视觉语言模型进行通用 Deepfake 检测Sohail Ahmed Khan, Duc-Tien Dang-Nguyenarxiv.org/pdf/2402.12…null
2024-02-20RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion ModelsRealCompo:现实主义和组合性之间的动态平衡改进了文本到图像的扩散模型Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cuiarxiv.org/pdf/2402.12…link
2024-02-20A Geometric Algorithm for Tubular Shape Reconstruction from Skeletal Representation一种从骨骼表示重建管状形状的几何算法Guoqing Zhang, Songzi Cat, Juzi Catarxiv.org/pdf/2402.12…link
2024-02-20Two-stage Rainfall-Forecasting Diffusion Model两阶段降雨量预报扩散模型XuDong Ling, ChaoRong Li, FengQing Qin, LiHong Zhu, Yuanyuan Huangarxiv.org/pdf/2402.12…link
2024-02-20MuLan: Multimodal-LLM Agent for Progressive Multi-Object DiffusionMuLan:用于渐进式多对象扩散的多模态 LLM 代理Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhouarxiv.org/pdf/2402.12…link
2024-02-20MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object ReconstructionMVDiffusion++:用于单视图或稀疏视图 3D 对象重建的密集高分辨率多视图扩散模型Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjanarxiv.org/pdf/2402.12…null
2024-02-20DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose EstimationDiffusionNOCS:管理 Sim2Real 多模态类别级姿势估计中的对称性和不确定性Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwakiarxiv.org/pdf/2402.12…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual ExamplesCounterCurate:通过反事实示例增强物理和语义视觉语言组合推理Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Leearxiv.org/pdf/2402.13…null
2024-02-20A Touch, Vision, and Language Dataset for Multimodal Alignment用于多模式对齐的触摸、视觉和语言数据集Letian Fu, Gaurav Datta, Huang Huang, William Chung-Ho Panitch, Jaimyn Drake, Joseph Ortiz, Mustafa Mukadam, Mike Lambeta, Roberto Calandra, Ken Goldbergarxiv.org/pdf/2402.13…null
2024-02-20How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts欺骗你的多式联运法学硕士有多容易?欺骗性提示的实证分析Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Ganarxiv.org/pdf/2402.13…null
2024-02-20OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded DialogOLViT:通过基于注意力的嵌入进行视频对话的多模态状态跟踪Adnen Abdessaied, Manuel von Hochmeister, Andreas Bullingarxiv.org/pdf/2402.13…null
2024-02-20ConVQG: Contrastive Visual Question Generation with Multimodal GuidanceConVQG:利用多模态指导生成对比视觉问题Li Mi, Syrielle Montariol, Javiera Castillo-Navarro, Xianjie Dai, Antoine Bosselut, Devis Tuiaarxiv.org/pdf/2402.12…null
2024-02-20Model Composition for Multimodal Large Language Models多模态大语言模型的模型组合Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, et.al.arxiv.org/pdf/2402.12…null
2024-02-20Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering模态感知与大型语言模型的集成,用于基于知识的视觉问答Junnan Dong, Qinggang Zhang, Huachi Zhou, Daochen Zha, Pai Zheng, Xiao Huangarxiv.org/pdf/2402.12…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a SurveyNeRF 和 3D 高斯分布如何重塑 SLAM:一项调查Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggiarxiv.org/pdf/2402.13…null
2024-02-20Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields提高相机位姿和分解低阶张量辐射场联合优化的鲁棒性Bo-Yu Cheng, Wei-Chen Chiu, Yu-Lun Liuarxiv.org/pdf/2402.13…link
2024-02-20OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy FlowOccFlowNet:通过可微渲染和占用流实现自监督占用估计Simon Boeder, Fabian Gigengack, Benjamin Rissearxiv.org/pdf/2402.12…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20FlashTex: Fast Relightable Mesh Texturing with LightControlNetFlashTex:使用 LightControlNet 进行快速可重新照明网格纹理Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawalaarxiv.org/pdf/2402.13…null
2024-02-20VideoPrism: A Foundational Visual Encoder for Video UnderstandingVideoPrism:用于视频理解的基础视觉编码器Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, et.al.arxiv.org/pdf/2402.13…null
2024-02-20Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model使用 CoRTe 进行跨域迁移学习:从黑盒到轻量级分割模型的一致且可靠的迁移Claudia Cuttano, Antonio Tavera, Fabio Cermelli, Giuseppe Averta, Barbara Caputoarxiv.org/pdf/2402.13…null
2024-02-20Improve Cross-Architecture Generalization on Dataset Distillation改进数据集蒸馏的跨架构泛化Binglin Zhou, Linhao Zhong, Wentao Chenarxiv.org/pdf/2402.13…link
2024-02-20Efficient Parameter Mining and Freezing for Continual Object Detection用于持续目标检测的高效参数挖掘和冻结Angelo G. Menezes, Augusto J. Peterlevitz, Mateus A. Chinelatto, André C. P. L. F. de Carvalhoarxiv.org/pdf/2402.12…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20Video ReCap: Recursive Captioning of Hour-Long Videos视频回顾:长达一小时的视频的递归字幕Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasiusarxiv.org/pdf/2402.13…null
2024-02-203D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data使用生物力学模型和综合训练数据从视频进行 3D 运动学估计Zhi-Yi Lin, Bofan Lyu, Judith Cueto Fernandez, Eline van der Kruk, Ajay Seth, Xucong Zhangarxiv.org/pdf/2402.13…null
2024-02-20Toward Fairness via Maximum Mean Discrepancy Regularization on Logits Space通过 Logits 空间上的最大均值差异正则化实现公平Hao-Wei Chung, Ching-Hao Chiu, Yu-Jen Chen, Yiyu Shi, Tsung-Yi Hoarxiv.org/pdf/2402.13…null
2024-02-20Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition用于连续视觉语音识别的传统混合解码器和 CTC/Attention 解码器的比较David Gimeno-Gómez, Carlos-D. Martínez-Hinarejosarxiv.org/pdf/2402.13…null
2024-02-20MapTrack: Tracking in the MapMapTrack:在地图中追踪Fei Wang, Ruohui Zhang, Chenglin Chen, Min Yang, Yun Baiarxiv.org/pdf/2402.12…null
2024-02-20Cell Graph Transformer for Nuclei Classification用于细胞核分类的细胞图转换器Wei Lou, Guanbin Li, Xiang Wan, Haofeng Liarxiv.org/pdf/2402.12…link
2024-02-20UniCell: Universal Cell Nucleus Classification via Prompt LearningUniCell:通过快速学习进行通用细胞核分类Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Liarxiv.org/pdf/2402.12…link
2024-02-20Advancements in Point Cloud-Based 3D Defect Detection and Classification for Industrial Systems: A Comprehensive Survey工业系统基于点云的 3D 缺陷检测和分类的进展:全面调查Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevicarxiv.org/pdf/2402.12…null
2024-02-20SolarPanel Segmentation :Self-Supervised Learning for Imperfect DatasetsSolarPanel 分割:不完美数据集的自我监督学习Sankarshanaa Sagaram, Aditya Kasliwal, Krish Didwania, Laven Srivastava, Pallavi Kailas, Ujjwal Vermaarxiv.org/pdf/2402.12…null
2024-02-20Radar-Based Recognition of Static Hand Gestures in American Sign Language基于雷达的美国手语静态手势识别Christian Schuessler, Wenxuan Zhang, Johanna Bräunig, Marcel Hoffmann, Michael Stelzig, Martin Vossiekarxiv.org/pdf/2402.12…null
2024-02-20GOOD: Towards Domain Generalized Orientated Object Detection好:迈向领域广义目标检测Qi Bi, Beichen Zhou, Jingjun Yi, Wei Ji, Haolan Zhan, Gui-Song Xiaarxiv.org/pdf/2402.12…null
2024-02-20BronchoTrack: Airway Lumen Tracking for Branch-Level Bronchoscopic LocalizationBronchoTrack:用于分支级支气管镜定位的气道管腔跟踪Qingyao Tian, Huai Liao, Xinyan Huang, Bingyu Yang, Jinlin Wu, Jian Chen, Lujie Li, Hongbin Liuarxiv.org/pdf/2402.12…null
2024-02-20Fingerprint Presentation Attack Detector Using Global-Local Model使用全局局部模型的指纹呈现攻击检测器Haozhe Liu, Wentian Zhang, Feng Liu, Haoqian Wu, Linlin Shenarxiv.org/pdf/2402.12…null
2024-02-20CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer LearningCST:参数和内存高效迁移学习的校准侧调Feng Chenarxiv.org/pdf/2402.12…null
2024-02-20PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality ImagesPAC-FNO:用于识别低质量图像的并行结构全分量傅立叶神经算子Jinsung Jeon, Hyundong Jin, Jonghyun Choi, Sanghyun Hong, Dongeun Lee, Kookjin Lee, Noseong Parkarxiv.org/pdf/2402.12…null
2024-02-20Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition学习用于少样本动作识别的域不变时间动力学Yuke Li, Guangyi Chen, Ben Abramowitz, Stefano Anzellott, Donglai Weiarxiv.org/pdf/2402.12…null
2024-02-20wmh_seg: Transformer based U-Net for Robust and Automatic White Matter Hyperintensity Segmentation across 1.5T, 3T and 7Twmh_seg:基于 Transformer 的 U-Net,用于 1.5T、3T 和 7T 的鲁棒和自动白质高信号分割Jinghang Li, Tales Santini, Yuanzhe Huang, Joseph M. Mettenburg, Tamer S. Ibrahima, Howard J. Aizensteina, Minjie Wuarxiv.org/pdf/2402.12…null
2024-02-20TorchCP: A Library for Conformal Prediction based on PyTorchTorchCP:基于 PyTorch 的共形预测库Hongxin Wei, Jianguo Huangarxiv.org/pdf/2402.12…link
2024-02-20Object-level Geometric Structure Preserving for Natural Image Stitching自然图像拼接的对象级几何结构保留Wenxiao Cai, Wankou Yangarxiv.org/pdf/2402.12…link
2024-02-20Neuromorphic Synergy for Video Binarization视频二值化的神经形态协同Shijie Lin, Xiang Zhang, Lei Yang, Lei Yu, Bin Zhou, Xiaowei Luo, Wenping Wang, Jia Panarxiv.org/pdf/2402.12…link
2024-02-20YOLO-Ant: A Lightweight Detector via Depthwise Separable Convolutional and Large Kernel Design for Antenna Interference Source DetectionYOLO-Ant:采用深度可分离卷积和大内核设计的轻量级检测器,用于天线干扰源检测Xiaoyu Tang, Xingming Chen, Jintao Cheng, Jin Wu, Rui Fan, Chengxi Zhang, Zebo Zhouarxiv.org/pdf/2402.12…link

GNN

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach以对象为中心的深度神经网络中的视觉推理:一种比较认知方法Guillermo Puebla, Jeffrey S. Bowersarxiv.org/pdf/2402.12…link

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20Slot-VLM: SlowFast Slots for Video-Language ModelingSlot-VLM:用于视频语言建模的 SlowFast 插槽Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Luarxiv.org/pdf/2402.13…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance EditingUniEdit:用于视频运动和外观编辑的统一免调优框架Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bianarxiv.org/pdf/2402.13…null
2024-02-20PIP-Net: Pedestrian Intention Prediction in the WildPIP-Net:野外行人意图预测Mohsen Azarmi, Mahdi Rezaei, He Wang, Sebastien Glaserarxiv.org/pdf/2402.12…null
2024-02-20RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic TransformerRhythmFormer:基于分层时间周期变换器提取 rPPG 信号Bochao Zou, Zizheng Guo, Jiansheng Chen, Huimin Maarxiv.org/pdf/2402.12…link

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-20VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic PlanningVADv2:通过概率规划实现端到端矢量化自动驾驶Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wangarxiv.org/pdf/2402.13…null
2024-02-20Design and Flight Demonstration of a Quadrotor for Urban Mapping and Target Tracking Research用于城市测绘和目标跟踪研究的四旋翼飞行器的设计和飞行演示Collin Hague, Nick Kakavitsas, Jincheng Zhang, Chris Beam, Andrew Willis, Artur Wolekarxiv.org/pdf/2402.13…null
2024-02-20exploreCOSMOS: Interactive Exploration of Conditional Statistical Shape Models in the Web-BrowserexploreCOSMOS:网络浏览器中条件统计形状模型的交互式探索Maximilian Hahn, Bernhard Eggerarxiv.org/pdf/2402.13…link
2024-02-20Mind the Exit Pupil Gap: Revisiting the Intrinsics of a Standard Plenoptic Camera注意出瞳间隙:重新审视标准全光相机的本质Tim Michels, Daniel Mäckelmann, Reinhard Kocharxiv.org/pdf/2402.12…link
2024-02-20ICON: Improving Inter-Report Consistency of Radiology Report Generation via Lesion-aware Mix-up AugmentationICON:通过病变感知混合增强提高放射学报告生成的报告间一致性Wenjun Hou, Yi Cheng, Kaishuai Xu, Yan Hu, Wenjie Li, Jiang Liuarxiv.org/pdf/2402.12…link
2024-02-20A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis用于在文本到图像合成中生成模型首选提示的用户友好框架Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang Zhangarxiv.org/pdf/2402.12…link
2024-02-20Denoising OCT Images Using Steered Mixture of Experts with Multi-Model Inference使用多模型推理专家的引导组合对 OCT 图像进行去噪Aytaç Özkan, Elena Stoykova, Thomas Sikora, Violeta Madjarovaarxiv.org/pdf/2402.12…null
2024-02-20Advancing Monocular Video-Based Gait Analysis Using Motion Imitation with Physics-Based Simulation使用运动模仿和基于物理的模拟推进基于单目视频的步态分析Nikolaos Smyrnakis, Tasos Karakostas, R. James Cottonarxiv.org/pdf/2402.12…null
2024-02-20A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective机器学习在数据变化方面的进展的全面回顾:跨领域视角Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chenarxiv.org/pdf/2402.12…null