[分享][每日更新][2024.02.08][CV_arxiv_papers]

189 阅读13分钟

[UPDATED!] 2024-02-08 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion ModelsCLR-Face:使用基于分数的扩散模型进行盲脸恢复的条件潜在细化Maitreya Suin, Rama Chellappaarxiv.org/pdf/2402.06…null
2024-02-08Animated Stickers: Bringing Stickers to Life with Video Diffusion动画贴纸:通过视频扩散让贴纸变得栩栩如生David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, et.al.arxiv.org/pdf/2402.06…null
2024-02-08InstaGen: Enhancing Object Detection by Training on Synthetic DatasetInstaGen:通过合成数据集训练增强目标检测Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Maarxiv.org/pdf/2402.05…null
2024-02-08Collaborative Control for Geometry-Conditioned PBR Image Generation几何条件 PBR 图像生成的协作控制Shimon Vainer, Mark Boss, Mathias Parger, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Nicolas Perony, Simon Donnéarxiv.org/pdf/2402.05…null
2024-02-08AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal ConditioningAvatarMMC:使用多模态调节生成和编辑 3D 头部头像Wamiq Reyaz Para, Abdelrahman Eldesokey, Zhenyu Li, Pradyumna Reddy, Jiankang Deng, Peter Wonkaarxiv.org/pdf/2402.05…null
2024-02-08CTGAN: Semantic-guided Conditional Texture Generator for 3D ShapesCTGAN:语义引导的 3D 形状条件纹理生成器Yi-Ting Pan, Chai-Rong Lee, Shu-Ho Fan, Jheng-Wei Su, Jia-Bin Huang, Yung-Yu Chuang, Hung-Kuo Chuarxiv.org/pdf/2402.05…null
2024-02-08DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion TransformerDiffSpeaker:带有扩散变压器的语音驱动 3D 面部动画Zhiyuan Ma, Xiangyu Zhu, Guojun Qi, Chen Qian, Zhaoxiang Zhang, Zhen Leiarxiv.org/pdf/2402.05…link
2024-02-08Scalable Diffusion Models with State Space Backbone具有状态空间主干的可扩展扩散模型Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huangarxiv.org/pdf/2402.05…link
2024-02-08Joint End-to-End Image Compression and Denoising: Leveraging Contrastive Learning and Multi-Scale Self-ONNs联合端到端图像压缩和去噪:利用对比学习和多尺度自 ONNYuxin Xie, Li Yu, Farhad Pakdaman, Moncef Gabboujarxiv.org/pdf/2402.05…null
2024-02-08Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game ApplicationMinecraft-ify:用于游戏内应用程序的 Minecraft 风格图像生成和文本引导图像编辑Bumsoo Kim, Sanghyun Byun, Yonghoon Jung, Wonseop Shin, Sareer UI Amin, Sanghyun Seoarxiv.org/pdf/2402.05…null
2024-02-08Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport通过不平衡最优传输进行生成建模的可扩展 Wasserstein 梯度流Jaemoo Choi, Jaewoong Choi, Myungjoo Kangarxiv.org/pdf/2402.05…null
2024-02-08Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models得到你想要的,而不是你不想要的:文本到图像扩散模型的图像内容抑制Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yangarxiv.org/pdf/2402.05…link
2024-02-08Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model反扫描:使用色彩校正扩散模型从扫描图像到原始图像Junghun Cha, Ali Haider, Seoyun Yang, Hoeyeong Jin, Subin Yang, A. F. M. Shahab Uddin, Jaehyoung Kim, Soo Ye Kim, Sung-Ho Baearxiv.org/pdf/2402.05…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based MapsCLIP-Loc:基于对象的地图中全球定位的多模式地标协会Shigemichi Matsuzaki, Takuma Sugino, Kazuhito Tanaka, Zijun Sha, Shintaro Nakaoka, Shintaro Yoshizawa, Kazuhiro Shintaniarxiv.org/pdf/2402.06…null
2024-02-08SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsSPHINX-X:缩放一系列多模态大型语言模型的数据和参数Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, et.al.arxiv.org/pdf/2402.05…link
2024-02-08WebLINX: Real-World Website Navigation with Multi-Turn DialogueWebLINX:具有多轮对话的真实世界网站导航Xing Han Lù, Zdeněk Kasner, Siva Reddyarxiv.org/pdf/2402.05…null
2024-02-08CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and FusionCREMA:通过高效模块化适应和融合进行多模态合成视频推理Shoubin Yu, Jaehong Yoon, Mohit Bansalarxiv.org/pdf/2402.05…null
2024-02-08FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power ForecastingFusionSF:在矢量量化框架中融合异构模式以实现稳健的太阳能预测Ziqing Ma, Wenwei Wang, Tian Zhou, Chao Chen, Bingqing Peng, Liang Sun, Rong Jinarxiv.org/pdf/2402.05…null
2024-02-08Question Aware Vision Transformer for Multimodal Reasoning用于多模态推理的问题感知视觉转换器Roy Ganz, Yair Kittenplon, Aviad Aberdam, Elad Ben Avraham, Oren Nuriel, Shai Mazor, Ron Litmanarxiv.org/pdf/2402.05…null
2024-02-08MTSA-SNN: A Multi-modal Time Series Analysis Model Based on Spiking Neural NetworkMTSA-SNN:基于尖峰神经网络的多模态时间序列分析模型Chengzhi Liu, Chong Zhong, Mingyu Jin, Zheng Tao, Zihong Luo, Chenghao Liu, Shuliang Zhaoarxiv.org/pdf/2402.05…link
2024-02-09Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey知识图满足多模态学习:综合调查Zhuo Chen, Yichi Zhang, Yin Fang, Yuxia Geng, Lingbing Guo, Xiang Chen, Qian Li, Wen Zhang, Jiaoyan Chen, Yushan Zhu, et.al.arxiv.org/pdf/2402.05…link
2024-02-08CIC: A framework for Culturally-aware Image CaptioningCIC:具有文化意识的图像字幕框架Youngsik Yun, Jihie Kimarxiv.org/pdf/2402.05…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery用于机器人手术的隐私保护综合连续语义分割Mengya Xu, Mobarakol Islam, Long Bai, Hongliang Renarxiv.org/pdf/2402.05…link
2024-02-08Flashback: Understanding and Mitigating Forgetting in Federated Learning闪回:理解和减轻联邦学习中的遗忘Mohammed Aljahdali, Ahmed M. Abdelmoniem, Marco Canini, Samuel Horvátharxiv.org/pdf/2402.05…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08Early Fusion of Features for Semantic Segmentation语义分割的早期特征融合Anupam Gupta, Ashok Krishnamurthy, Lisa Singharxiv.org/pdf/2402.06…null
2024-02-08Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing探索 GPT-4V 中的视觉文化意识:全面的探索Yong Cao, Wenyan Li, Jiaang Li, Yifei Yuan, Daniel Hershcovicharxiv.org/pdf/2402.06…null
2024-02-08Point-VOS: Pointing Up Video Object SegmentationPoint-VOS:指向上方视频对象分割Idil Esen Zulfikar, Sabarinath Mahadevan, Paul Voigtlaender, Bastian Leibearxiv.org/pdf/2402.05…null
2024-02-08ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentationClickSAM:使用点击提示微调 Segment Anything Model 以进行超声图像分割Aimee Guo, Gace Fei, Hemanth Pasupuletic, Jing Wangarxiv.org/pdf/2402.05…null
2024-02-08Mamba-ND: Selective State Space Modeling for Multi-Dimensional DataMamba-ND:多维数据的选择性状态空间建模Shufan Li, Harkanwar Singh, Aditya Groverarxiv.org/pdf/2402.05…null
2024-02-08Using YOLO v7 to Detect Kidney in Magnetic Resonance Imaging: A Supervised Contrastive Learning使用 YOLO v7 在磁共振成像中检测肾脏:监督对比学习Pouria Yazdian Anari, Fiona Obiezu, Nathan Lay, Fatemeh Dehghani Firouzabadi, Aditi Chaurasia, Mahshid Golagha, Shiva Singh, Fatemeh Homayounieh, Aryan Zahergivar, Stephanie Harmon, et.al.arxiv.org/pdf/2402.05…null
2024-02-08Jacquard V2: Refining Datasets using the Human In the Loop Data Correction MethodJacquard V2:使用“人在环”数据校正方法细化数据集Qiuhao Li, Shenghai Yuanarxiv.org/pdf/2402.05…null
2024-02-08An Ordinal Regression Framework for a Deep Learning Based Severity Assessment for Chest Radiographs基于深度学习的胸部 X 线照片严重性评估的序数回归框架Patrick Wienholt, Alexander Hermans, Firas Khader, Behrus Puladi, Bastian Leibe, Christiane Kuhl, Sven Nebelung, Daniel Truhnarxiv.org/pdf/2402.05…link
2024-02-08DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain AdaptationDAPlankton:通过细粒度域适应进行多仪器浮游生物识别的基准数据集Daniel Batrakhanov, Tuomas Eerola, Kaisa Kraft, Lumi Haraguchi, Lasse Lensu, Sanna Suikkanen, María Teresa Camarena-Gómez, Jukka Seppälä, Heikki Kälviäinenarxiv.org/pdf/2402.05…null
2024-02-08RESMatch: Referring Expression Segmentation in a Semi-Supervised MannerRESMatch:半监督方式的引用表达分割Ying Zang, Chenglong Fu, Runlong Cao, Didi Zhu, Min Zhang, Wenjun Hu, Lanyun Zhu, Tianrun Chenarxiv.org/pdf/2402.05…null
2024-02-08One-Stop Automated Diagnostic System for Carpal Tunnel Syndrome in Ultrasound Images Using Deep Learning使用深度学习的超声图像腕管综合症一站式自动诊断系统Jiayu Peng, Jiajun Zeng, Manlin Lai, Ruobing Huang, Dong Ni, Zhenzhou Liarxiv.org/pdf/2402.05…null
2024-02-08Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction高效的表情中性估计及其在人脸识别效用预测中的应用Marcel Grimmer, Raymond N. J. Veldhuis, Christoph Buscharxiv.org/pdf/2402.05…null
2024-02-08Spiking Neural Network Enhanced Hand Gesture Recognition Using Low-Cost Single-photon Avalanche Diode Array使用低成本单光子雪崩二极管阵列的尖峰神经网络增强手势识别Zhenya Zang, Xingda Li, David Day Uei Liarxiv.org/pdf/2402.05…link
2024-02-08Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification基于无分割联结时间分类损失的文本验证码分类 OCR 模型Vaibhav Khatavkar, Makarand Velankar, Sneha Petkararxiv.org/pdf/2402.05…null
2024-02-08SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target DetectorSpirDet:迈向高效、准确、轻便的红外小目标探测器Qianchen Mao, Qiang Li, Bingshu Wang, Yongjun Zhang, Tao Dai, C. L. Philip Chenarxiv.org/pdf/2402.05…null
2024-02-08Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions通过对一系列损失函数进行训练来优化类不平衡数据上的 ROC 曲线Kelsey Lieberman, Shuai Yuan, Swarna Kamlam Ravindran, Carlo Tomasiarxiv.org/pdf/2402.05…link
2024-02-08On the Effect of Image Resolution on Semantic Segmentation图像分辨率对语义分割的影响Ritambhara Singh, Abhishek Jain, Pietro Perona, Shivani Agarwal, Junfeng Yangarxiv.org/pdf/2402.05…null
2024-02-08Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts通过混合集群条件专家的任务定制屏蔽自动编码器Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwokarxiv.org/pdf/2402.05…null
2024-02-08Scrapping The Web For Early Wildfire Detection废弃网络以进行早期野火检测Mateo Lostanlen, Felix Veith, Cristian Buc, Valentin Barrierearxiv.org/pdf/2402.05…null

图像理解

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images单目图像几何估计的自适应表面法线约束Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Lingjie Liu, Hao Zhao, Guyue Zhou, Wenping Wangarxiv.org/pdf/2402.05…null
2024-02-08Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents通过协作法学硕士代理进行自动驾驶的可编辑场景模拟Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wangarxiv.org/pdf/2402.05…link

LLM

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images使用新颖的并行图像数据集检查大型视觉语言模型中的性别和种族偏见Kathleen C. Fraser, Svetlana Kiritchenkoarxiv.org/pdf/2402.05…link
2024-02-08Real-World Robot Applications of Foundation Models: A Review基础模型的现实世界机器人应用:回顾Kento Kawaharazuka, Tatsuya Matsushima, Andrew Gambardella, Jiaxian Guo, Chris Paxton, Andy Zengarxiv.org/pdf/2402.05…null
2024-02-08Enhancing Zero-shot Counting via Language-guided Exemplar Learning通过语言引导的示例学习增强零样本计数Mingjie Wang, Jun Zhou, Yong Dai, Eric Buys, Minglun Gongarxiv.org/pdf/2402.05…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy内存高效的视觉变压器:激活感知的混合等级压缩策略Seyedarmin Azizi, Mahdi Nazemi, Massoud Pedramarxiv.org/pdf/2402.06…null
2024-02-08Memory Consolidation Enables Long-Context Video Understanding内存整合可实现长上下文视频理解Ivana Balažević, Yuge Shi, Pinelopi Papalampidi, Rahma Chaabouni, Skanda Koppula, Olivier J. Hénaffarxiv.org/pdf/2402.05…null
2024-02-08You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement您只需要一种色彩空间:用于低光图像增强的高效网络Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Qingsen Yan, Yanning Zhangarxiv.org/pdf/2402.05…link
2024-02-08Binding Dynamics in Rotating Features旋转特征中的绑定动力学Sindy Löwe, Francesco Locatello, Max Wellingarxiv.org/pdf/2402.05…null
2024-02-08AttnLRP: Attention-Aware Layer-wise Relevance Propagation for TransformersAttnLRP:Transformers 的注意力感知分层相关性传播Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samekarxiv.org/pdf/2402.05…null
2024-02-08On Convolutional Vision Transformers for Yield Prediction用于产量预测的卷积视觉变压器Alvin Inderka, Florian Huber, Volker Steinhagearxiv.org/pdf/2402.05…null
2024-02-08MIGC: Multi-Instance Generation Controller for Text-to-Image SynthesisMIGC:用于文本到图像合成的多实例生成控制器Dewei Zhou, You Li, Fan Ma, Zongxin Yang, Yi Yangarxiv.org/pdf/2402.05…link
2024-02-08Unleashing the Infinity Power of Geometry: A Novel Geometry-Aware Transformer (GOAT) for Whole Slide Histopathology Image Analysis释放几何的无限力量:用于全玻片组织病理学图像分析的新型几何感知转换器 (GOAT)Mingxin Liu, Yunzan Liu, Pengbo Xu, Jiquan Maarxiv.org/pdf/2402.05…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-083D-2D Neural Nets for Phase Retrieval in Noisy Interferometric Imaging用于噪声干涉成像中相位检索的 3D-2D 神经网络Andrew H. Proppe, Guillaume Thekkadath, Duncan England, Philip J. Bustard, Frédéric Bouchard, Jeff S. Lundeen, Benjamin J. Sussmanarxiv.org/pdf/2402.06…null
2024-02-08InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and WriteInkSight:通过学习读写实现离线到在线手写转换Blagoj Mitrevski, Arina Rak, Julian Schnitzler, Chengkun Li, Andrii Maksai, Jesse Berent, Claudiu Musatarxiv.org/pdf/2402.05…null
2024-02-08UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial ImageryUAV-Rain1k:无人机航拍图像去除雨滴的基准Wenhui Chang, Hongming Chen, Xin He, Xiang Chen, Liangduo Shenarxiv.org/pdf/2402.05…link
2024-02-08An Optimization-based Baseline for Rigid 2D/3D Registration Applied to Spine Surgical Navigation Using CMA-ES基于优化的刚性 2D/3D 配准基线应用于使用 CMA-ES 的脊柱手术导航Minheng Chen, Tonglong Li, Zhirun Zhang, Youyong Kongarxiv.org/pdf/2402.05…null
2024-02-09NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object InteractionNCRF:用于手-物体交互的自由视点渲染的神经接触辐射场Zhongqun Zhang, Jifei Song, Eduardo Pérez-Pellitero, Yiren Zhou, Hyung Jin Chang, Aleš Leonardisarxiv.org/pdf/2402.05…null
2024-02-08Memory-efficient deep end-to-end posterior network (DEEPEN) for inverse problems用于反问题的内存高效深度端到端后验网络(DEEPEN)Jyothi Rikhab Chand, Mathews Jacobarxiv.org/pdf/2402.05…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08TaE: Task-aware Expandable Representation for Long Tail Class Incremental LearningTaE:长尾类增量学习的任务感知可扩展表示Linjie Li, S. Liu, Zhenyu Wu, JI yangarxiv.org/pdf/2402.05…null
2024-02-08FuncGrasp: Learning Object-Centric Neural Grasp Functions from Single Annotated Example ObjectFuncGrasp:从单个带注释的示例对象中学习以对象为中心的神经抓取功能Hanzhi Chen, Binbin Xu, Stefan Leuteneggerarxiv.org/pdf/2402.05…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-02-08Impact on Public Health Decision Making by Utilizing Big Data Without Domain Knowledge在没有领域知识的情况下利用大数据对公共卫生决策的影响Miao Zhang, Salman Rahman, Vishwali Mhasawade, Rumi Chunaraarxiv.org/pdf/2402.06…null
2024-02-08Contrastive Approach to Prior Free Positive Unlabeled Learning先前自由积极无标记学习的对比方法Anish Acharya, Sujay Sanghaviarxiv.org/pdf/2402.06…null
2024-02-08Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations隐藏在众目睽睽之下:对弱势患者群体的不可察觉的对抗性偏见攻击Pranav Kulkarni, Andrew Chan, Nithya Navarathna, Skylar Chan, Paul H. Yi, Vishwa S. Parekharxiv.org/pdf/2402.05…link
2024-02-08Real-time Holistic Robot Pose Estimation with Unknown States未知状态下的实时整体机器人姿态估计Shikun Ban, Juling Fan, Wentao Zhu, Xiaoxuan Ma, Yu Qiao, Yizhou Wangarxiv.org/pdf/2402.05…link
2024-02-08Learning pseudo-contractive denoisers for inverse problems学习逆问题的伪收缩降噪器Deliang Wei, Peng Chen, Fang Liarxiv.org/pdf/2402.05…null
2024-02-08Extending 6D Object Pose Estimators for Stereo Vision扩展立体视觉的 6D 物体姿态估计器Thomas Pöllabauer, Jan Emrich, Volker Knauthe, Arjan Kuijperarxiv.org/pdf/2402.05…null
2024-02-08A Concept for Reconstructing Stucco Statues from historic Sketches using synthetic Data only仅使用合成数据从历史草图重建灰泥雕像的概念Thomas Pöllabauer, Julius Kühnarxiv.org/pdf/2402.05…null
2024-02-08Neural Graphics Primitives-based Deformable Image Registration for On-the-fly Motion Extraction用于动态运动提取的基于神经图形基元的可变形图像配准Xia Li, Fabian Zhang, Muheng Li, Damien Weber, Antony Lomax, Joachim Buhmann, Ye Zhangarxiv.org/pdf/2402.05…null