[分享][每日更新][2024.03.05][CV_arxiv_papers]

369 阅读13分钟

[UPDATED!] 2024-03-05 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05Scaling Rectified Flow Transformers for High-Resolution Image Synthesis缩放整流流量变压器以实现高分辨率图像合成Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et.al.arxiv.org/pdf/2403.03…null
2024-03-05Triple-CFN: Restructuring Conceptual Spaces for Enhancing Abstract Reasoning processTriple-CFN:重构概念空间以增强抽象推理过程Ruizhuo Song, Beiming Yuanarxiv.org/pdf/2403.03…null
2024-03-05NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose PriorsNRDF:用于学习铰接姿势先验的神经黎曼距离场Yannan He, Garvita Tiwari, Tolga Birdal, Jan Eric Lenssen, Gerard Pons-Mollarxiv.org/pdf/2403.03…null
2024-03-05Doubly Abductive Counterfactual Inference for Text-based Image Editing基于文本的图像编辑的双重溯因反事实推理Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiangarxiv.org/pdf/2403.02…null
2024-03-05Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity具有文本引导编码的神经图像压缩,可实现像素级和感知保真度Hagyeong Lee, Minkyu Kim, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Leearxiv.org/pdf/2403.02…null
2024-03-05Cross-Domain Image Conversion by CycleDMCycleDM 的跨域图像转换Sho Shimotsumagari, Shumpei Takezaki, Daichi Haraguchi, Seiichi Uchidaarxiv.org/pdf/2403.02…null
2024-03-05Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders使用条件扩散解码器增强学习图像编解码器的率失真感知灵活性Daniele Mari, Simone Milaniarxiv.org/pdf/2403.02…null
2024-03-05Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image EnhancementZero-LED:用于低光图像增强的零参考照明估计扩散模型Jinhong He, Minglong Xue, Zhipu Liu, Chengyun Song, Senming Zhongarxiv.org/pdf/2403.02…null
2024-03-05Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation用于高保真图像到视频生成的免调谐噪声校正Weijie Li, Litong Gong, Yiran Zhu, Fanda Fan, Biao Wang, Tiezheng Ge, Bo Zhengarxiv.org/pdf/2403.02…null
2024-03-05Fast, Scale-Adaptive, and Uncertainty-Aware Downscaling of Earth System Model Fields with Generative Foundation Models使用生成基础模型对地球系统模型场进行快速、尺度自适应和不确定性感知缩减Philipp Hess, Michael Aich, Baoxiang Pan, Niklas Boersarxiv.org/pdf/2403.02…null
2024-03-05Few-shot Learner Parameterization by Diffusion Time-steps通过扩散时间步长进行少样本学习器参数化Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sunarxiv.org/pdf/2403.02…null
2024-03-05Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning通过概率感知学习增强弱监督 3D 医学图像分割Zhaoxin Fan, Runmin Jiang, Junhao Wu, Xin Huang, Tianyang Wang, Heng Huang, Min Xuarxiv.org/pdf/2403.02…null
2024-03-05Semantic Human Mesh Reconstruction with Textures使用纹理进行语义人体网格重建Xiaoyu Zhan, Jianxin Yang, Yuanqi Li, Jie Guo, Yanwen Guo, Wenping Wangarxiv.org/pdf/2403.02…null
2024-03-05Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research更新生成建模研究的临床人工智能 (MI-CLAIM) 最低信息清单Brenda Y. Miao, Irene Y. Chen, Christopher YK Williams, Jaysón Davidson, Augusto Garcia-Agundez, Harry Sun, Travis Zack, Atul J. Butte, Madhumita Sushilarxiv.org/pdf/2403.02…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion具有多模态注意力融合的自监督 3D 患者建模Meng Zheng, Benjamin Planche, Xuan Gong, Fan Yang, Terrence Chen, Ziyan Wuarxiv.org/pdf/2403.03…null
2024-03-05Design2Code: How Far Are We From Automating Front-End Engineering?Design2Code:我们距离自动化前端工程还有多远?Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, Diyi Yangarxiv.org/pdf/2403.03…null
2024-03-05Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models大饱眼福:多模态大语言模型的混合分辨率自适应Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Jiarxiv.org/pdf/2403.03…null
2024-03-05MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language TransformerMADTP:多模态对齐引导的动态令牌修剪,用于加速视觉语言变压器Jianjian Cao, Peng Ye, Shengze Li, Chong Yu, Yansong Tang, Jiwen Lu, Tao Chenarxiv.org/pdf/2403.02…null
2024-03-05Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception具有细粒度视觉感知的多模式指令调整法学硕士Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Xuansong Xiearxiv.org/pdf/2403.02…null
2024-03-05Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples通过硬负样本增强多模态对比学习中的概念理解Philipp J. Rösch, Norbert Oswald, Michaela Geierhos, Jindřich Libovickýarxiv.org/pdf/2403.02…null
2024-03-05Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation通过梯度引导模型扰动增强医学视觉问答任务的泛化Gang Liu, Hongyang Li, Zerui He, Shenjun Zhongarxiv.org/pdf/2403.02…null
2024-03-05Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters微调的多模态语言模型是高质量的图像文本数据过滤器Weizhi Wang, Khalil Mrini, Linjie Yang, Sateesh Kumar, Yu Tian, Xifeng Yan, Heng Wangarxiv.org/pdf/2403.02…null
2024-03-05Interactive Continual Learning: Fast and Slow Thinking交互式持续学习:快思考和慢思考Biqing Qi, Xingquan Chen, Junqi Gao, Jianxing Liu, Ligang Wu, Bowen Zhouarxiv.org/pdf/2403.02…null
2024-03-05VEglue: Testing Visual Entailment Systems via Object-Aligned Joint ErasingVEglue:通过对象对齐联合擦除测试视觉蕴涵系统Zhiyuan Chang, Mingyang Li, Junjie Wang, Cheng Li, Qing Wangarxiv.org/pdf/2403.02…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05PromptKD: Unsupervised Prompt Distillation for Vision-Language ModelsPromptKD:视觉语言模型的无监督快速蒸馏Zheng Li, Xiang Li, Xinyi Fu, Xing Zhang, Weiqiang Wang, Jian Yangarxiv.org/pdf/2403.02…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05Solving the bongard-logo problem by modeling a probabilistic model通过建立概率模型来解决 boongard-logo 问题Ruizhuo Song, Beiming Yuanarxiv.org/pdf/2403.03…null
2024-03-05PalmProbNet: A Probabilistic Approach to Understanding Palm Distributions in Ecuadorian Tropical Forest via Transfer LearningPalmProbNet:通过迁移学习了解厄瓜多尔热带森林棕榈分布的概率方法Kangning Cui, Zishan Shao, Gregory Larsen, Victor Pauca, Sarra Alqahtani, David Segurado, João Pinheiro, Manqi Wang, David Lutz, Robert Plemmons, et.al.arxiv.org/pdf/2403.03…null
2024-03-05Simplicity in Complexity复杂中的简单Kevin Shen, Surabhi S Nath, Aenne Brielmann, Peter Dayanarxiv.org/pdf/2403.03…null
2024-03-05Motion-Corrected Moving Average: Including Post-Hoc Temporal Information for Improved Video Segmentation运动校正移动平均:包括事后时间信息以改进视频分割Robert Mendel, Tobias Rueckert, Dirk Wilhelm, Daniel Rueckert, Christoph Palmarxiv.org/pdf/2403.03…null
2024-03-05Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection使用深度语义分割和新颖的异常值检测改进 LiDAR 里程计和地图绘制Mohamed Afifi, Mohamed ElHelwarxiv.org/pdf/2403.03…null
2024-03-05MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual GroundingMiKASA:用于 3D 视觉基础的多键锚点和场景感知变压器Chun-Peng Chang, Shaoxiang Wang, Alain Pagani, Didier Strickerarxiv.org/pdf/2403.03…null
2024-03-05CrackNex: a Few-shot Low-light Crack Segmentation Model Based on Retinex Theory for UAV InspectionsCrackNex:基于 Retinex 理论的无人机检测少镜头微光裂纹分割模型Zhen Yao, Jiawei Xu, Shuhang Hou, Mooi Choo Chuaharxiv.org/pdf/2403.03…null
2024-03-05ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilitiesChatGPT 和生物识别:面部识别、性别检测和年龄估计能力的评估Ahmad Hassanpour, Yasamin Kowsari, Hatef Otroshi Shahreza, Bian Yang, Sebastien Marcelarxiv.org/pdf/2403.02…null
2024-03-05XAI-Based Detection of Adversarial Attacks on Deepfake Detectors基于 XAI 的 Deepfake 探测器对抗性攻击检测Ben Pinhasov, Raz Lapid, Rony Ohayon, Moshe Sipper, Yehudit Apersteinarxiv.org/pdf/2403.02…null
2024-03-05Citizen Science and Machine Learning for Research and Nature Conservation: The Case of Eurasian Lynx, Free-ranging Rodents and Insects用于研究和自然保护的公民科学和机器学习:欧亚山猫、自由放养的啮齿动物和昆虫的案例Kinga Skorupska, Rafał Stryjek, Izabela Wierzbowska, Piotr Bebas, Maciej Grzeszczuk, Piotr Gago, Jarosław Kowalski, Maciej Krzywicki, Jagoda Lazarek, Wiesław Kopećarxiv.org/pdf/2403.02…null
2024-03-05Enhancing Long-Term Person Re-Identification Using Global, Local Body Part, and Head Streams使用全局、局部身体部位和头部流增强长期人员重新识别Duy Tran Thanh, Yeejin Lee, Byeongkeun Kangarxiv.org/pdf/2403.02…null
2024-03-05Revisiting Confidence Estimation: Towards Reliable Failure Prediction重新审视置信度估计:实现可靠的故障预测Fei Zhu, Xu-Yao Zhang, Zhen Cheng, Cheng-Lin Liuarxiv.org/pdf/2403.02…null
2024-03-05ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous DrivingActiveAD:面向规划的端到端自动驾驶主动学习Han Lu, Xiaosong Jia, Yichen Xie, Wenlong Liao, Xiaokang Yang, Junchi Yanarxiv.org/pdf/2403.02…null
2024-03-05Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?从点云进行 3D 物体检测是否始终需要密集标签?Chenqiang Gao, Chuandong Liu, Jun Shu, Fangcen Liu, Jiang Liu, Luyu Yang, Xinbo Gao, Deyu Mengarxiv.org/pdf/2403.02…null
2024-03-05DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain AdaptationDDF:一种新型双域图像融合策略,用于具有无监督域适应的遥感图像语义分割Lingyan Ran, Lushuang Wang, Tao Zhuo, Yinghui Xingarxiv.org/pdf/2403.02…null
2024-03-05HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real ScenesHUNTER:通过将合成实例的知识转移到真实场景进行无监督的以人为中心的 3D 检测Yichen Yao, Zimo Jiang, Yujing Sun, Zhencai Zhu, Xinge Zhu, Runnan Chen, Yuexin Maarxiv.org/pdf/2403.02…null
2024-03-05DeconfuseTrack:Dealing with Confusion for Multi-Object TrackingDeconfuseTrack:处理多目标跟踪的混乱Cheng Huang, Shoudong Han, Mengyu He, Wenbo Zheng, Yuhao Weiarxiv.org/pdf/2403.02…null
2024-03-05Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels在没有确切指导的情况下学习:根据低分辨率历史标签更新大规模高分辨率土地覆盖图Zhuohong Li, Wei He, Jiepan Li, Fangxiao Lu, Hongyan Zhangarxiv.org/pdf/2403.02…null
2024-03-05Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery在高分辨率卫星图像中引导稀有物体检测Akram Zaytar, Caleb Robinson, Gilles Q. Hacheme, Girmaw A. Tadesse, Rahul Dodhia, Juan M. Lavista Ferres, Lacey F. Hughey, Jared A. Stabach, Irene Amokearxiv.org/pdf/2403.02…null
2024-03-05FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective ViewFastOcc:通过融合 2D 鸟瞰图和透视图加速 3D 占用预测Jiawei Hou, Xiaoyan Li, Wenhao Guan, Gang Zhang, Di Feng, Yuheng Du, Xiangyang Xue, Jian Puarxiv.org/pdf/2403.02…null
2024-03-05Deep Common Feature Mining for Efficient Video Semantic Segmentation深度共同特征挖掘实现高效视频语义分割Yaoyan Zheng, Hongyu Yang, Di Huangarxiv.org/pdf/2403.02…null
2024-03-05UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map EstimationUFO:用于越野语义地形图估计的不确定性激光雷达图像融合Ohn Kim, Junwon Seo, Seongyong Ahn, Chong Hui Kimarxiv.org/pdf/2403.02…null
2024-03-05False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy基于误报采样的数据增强可增强 3D 对象检测的准确性Jiyong Oh, Junhaeng Lee, Woongchan Byun, Minsang Kong, Sang Hun Leearxiv.org/pdf/2403.02…null
2024-03-05BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object DetectionBSDP:用于在线开放世界对象检测的受大脑启发的流式双级扰动Yu Chen, Liyan Ma, Liping Jing, Jian Yuarxiv.org/pdf/2403.02…null
2024-03-05Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use建模协作者:通过 LLM 工具使用以最少的人力实现主观视觉分类Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, et.al.arxiv.org/pdf/2403.02…null
2024-03-05Systemic Biases in Sign Language AI Research: A Deaf-Led Call to Reevaluate Research Agendas手语人工智能研究中的系统性偏见:聋人主导的重新评估研究议程的呼吁Aashaka Desai, Maartje De Meulder, Julie A. Hochgesang, Annemarie Kocab, Alex X. Luarxiv.org/pdf/2403.02…null

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05ImgTrojan: Jailbreaking Vision-Language Models with ONE ImageImgTrojan:使用一张图像越狱视觉语言模型Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kongarxiv.org/pdf/2403.02…null
2024-03-05Android in the Zoo: Chain-of-Action-Thought for GUI Agents动物园里的 Android:GUI 代理的行动链思想Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tangarxiv.org/pdf/2403.02…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose EstimationFAR:灵活、准确且鲁棒的 6DoF 相对相机姿态估计Chris Rockwell, Nilesh Kulkarni, Linyi Jin, Jeong Joon Park, Justin Johnson, David F. Fouheyarxiv.org/pdf/2403.03…null
2024-03-05A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning具有多金字塔变压器和对比学习的显微镜散焦去模糊统一框架Yuelin Zhang, Pengyu Zheng, Wanquan Yan, Chengyu Fang, Shing Shin Chengarxiv.org/pdf/2403.02…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure CooperativeHoloVIC:多传感器全息交叉口和车路协同的大规模数据集和基准Cong Ma, Lei Qiao, Chengkai Zhu, Kai Liu, Zelong Kong, Qing Li, Xueqi Zhou, Yuheng Kan, Wei Wuarxiv.org/pdf/2403.02…null
2024-03-05Towards Geometric-Photometric Joint Alignment for Facial Mesh Registration面向面部网格配准的几何光度联合对准Xizhi Wang, Yaxiong Wang, Mengjian Liarxiv.org/pdf/2403.02…null
2024-03-05Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning低分辨率引领潮流:通过自监督学习提高超分辨率的泛化能力Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhuarxiv.org/pdf/2403.02…null
2024-03-05Pooling Image Datasets With Multiple Covariate Shift and Imbalance具有多个协变量偏移和不平衡的池化图像数据集Sotirios Panagiotis Chytas, Vishnu Suresh Lokhande, Peiran Li, Vikas Singharxiv.org/pdf/2403.02…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization对偶平均教师:用于视听源定位的无偏半监督框架Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zhengarxiv.org/pdf/2403.03…null
2024-03-05Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization用于半监督视听源定位的交叉伪标签Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zouarxiv.org/pdf/2403.03…null
2024-03-05Recall-Oriented Continual Learning with Generative Adversarial Meta-Model具有生成对抗性元模型的面向回忆的持续学习Haneol Kang, Dong-Wan Choiarxiv.org/pdf/2403.03…link
2024-03-05Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives通过硬阴性和软阴性的监督对比学习进行康复运动质量评估Mark Karlov, Ali Abedi, Shehroz S. Khanarxiv.org/pdf/2403.02…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-05A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives充满技能的背包:以自我为中心的视频理解与多样化的任务视角Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Giuseppe Avertaarxiv.org/pdf/2403.03…null
2024-03-05Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks使用时间编码事件驱动神经网络在黑暗中进行注视矢量估计Abeer Banerjee, Naval K. Mehta, Shyam S. Prasad, Himanshu, Sumeet Saurav, Sanjay Singharxiv.org/pdf/2403.02…null
2024-03-05Towards Robust Federated Learning via Logits Calibration on Non-IID Data通过非 IID 数据的 Logits 校准实现稳健的联邦学习Yu Qiao, Apurba Adhikary, Chaoning Zhang, Choong Seon Hongarxiv.org/pdf/2403.02…null
2024-03-05Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos为什么不使用你的教科书?教学视频的知识增强程序规划Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khanarxiv.org/pdf/2403.02…null
2024-03-05Learning Group Activity Features Through Person Attribute Prediction通过人员属性预测学习群体活动特征Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukitaarxiv.org/pdf/2403.02…null
2024-03-05DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain GeneralizationDomainVerse:免调整自适应域泛化的现实世界分布转变的基准Feng Hou, Jin Yuan, Ying Yang, Yang Liu, Yang Zhang, Cheng Zhong, Zhongchao Shi, Jianping Fan, Yong Rui, Zhiqiang Hearxiv.org/pdf/2403.02…null
2024-03-05Dirichlet-based Per-Sample Weighting by Transition Matrix for Noisy Label Learning通过转移矩阵进行基于狄利克雷的每样本加权,用于噪声标签学习HeeSun Bae, Seungjae Shin, Byeonghu Na, Il-Chul Moonarxiv.org/pdf/2403.02…link
2024-03-05What do we learn from inverting CLIP models?我们从反演 CLIP 模型中学到了什么?Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, Tom Goldsteinarxiv.org/pdf/2403.02…null
2024-03-05DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-trainingDPAdapter:通过噪声容忍预训练改进差分隐私深度学习Zihao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, John Mitchell, Haixu Tang, XiaoFeng Wangarxiv.org/pdf/2403.02…null