[UPDATED!] 2024-02-22 (Publish Time)
生成模型
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | Cameras as Rays: Pose Estimation via Ray Diffusion | 相机作为光线:通过光线扩散进行姿势估计 | Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models | 定制视频:文本到视频扩散模型的一次性运动定制 | Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastava | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Zero-Shot Pediatric Tuberculosis Detection in Chest X-Rays using Self-Supervised Learning | 使用自我监督学习在胸部 X 射线中进行零次小儿结核病检测 | Daniel Capellán-Martín, Abhijeet Parida, Juan J. Gómez-Valverde, Ramon Sanchez-Jacob, Pooneh Roshanitabrizi, Marius G. Linguraru, María J. Ledesma-Carbayo, Syed M. Anwar | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Visual Hallucinations of Multi-modal Large Language Models | 多模态大语言模型的视觉幻觉 | Wen Huang, Hongbin Liu, Minxin Guo, Neil Zhenqiang Gong | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Debiasing Text-to-Image Diffusion Models | 消除文本到图像扩散模型的偏差 | Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning | 通过离散扩散进行大规模无动作视频预训练,以实现高效的策略学习 | Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment | 基于扩散模型的视觉补偿引导和视觉差异分析用于无参考图像质量评估 | Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing | 渐进残差对齐:GAN 反演和图像属性编辑的双流框架 | Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Uncertainty-driven and Adversarial Calibration Learning for Epicardial Adipose Tissue Segmentation | 心外膜脂肪组织分割的不确定性驱动和对抗性校准学习 | Kai Zhao, Zhiming Liu, Jiaqi Liu, Jingbiao Zhou, Bihong Liao, Huifang Tang, Qiuyu Wang, Chunquan Li | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Typographic Text Generation with Off-the-Shelf Diffusion Model | 使用现成的扩散模型生成印刷文本 | KhayTze Peong, Seiichi Uchida, Daichi Haraguchi | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Font Style Interpolation with Diffusion Models | 使用扩散模型进行字体样式插值 | Tetta Kondo, Shumpei Takezaki, Daichi Haraguchi, Seiichi Uchida | arxiv.org/pdf/2402.14… | null |
2024-02-22 | A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation | 将视觉上下文学习与掩模图像建模相结合以改进超声分割的简单框架 | Yuyue Zhou, Banafshe Felfeliyan, Shrimanti Ghosh, Jessica Knight, Fatima Alves-Pereira, Christopher Keen, Jessica Küpper, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko | arxiv.org/pdf/2402.14… | null |
2024-02-22 | MVD![^2](): Efficient Multiview 3D Reconstruction for Multiview Diffusion | MVD![^2]():用于多视图扩散的高效多视图 3D 重建 | Xin-Yang Zheng, Hao Pan, Yu-Xiao Guo, Xin Tong, Yang Liu | arxiv.org/pdf/2402.14… | null |
多模态
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | PALO: A Polyglot Large Multimodal Model for 5B People | PALO:面向 5B 人群的多语言大型多模式模型 | Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | 使用 MATH-Vision 数据集测量多模态数学推理 | Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Mingjie Zhan, Hongsheng Li | arxiv.org/pdf/2402.14… | null |
2024-02-22 | DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models | DualFocus:在多模态大语言模型中整合宏观和微观视角 | Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation | RoboScript:跨真实和模拟的自由形式操作任务的代码生成 | Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, et.al. | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective | 少即是多:从 EOS 决策角度减轻多模态幻觉 | Zihao Yue, Liang Zhang, Qin Jin | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Uncertainty-Aware Evaluation for Vision-Language Models | 视觉语言模型的不确定性评估 | Vasily Kostumov, Bulat Nutfullin, Oleg Pilipenko, Eugene Ilyushin | arxiv.org/pdf/2402.14… | null |
Nerf
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | Consolidating Attention Features for Multi-view Image Editing | 整合多视图图像编辑的注意力特征 | Or Patashnik, Rinon Gal, Daniel Cohen-Or, Jun-Yan Zhu, Fernando De la Torre | arxiv.org/pdf/2402.14… | null |
2024-02-22 | FrameNeRF: A Simple and Efficient Framework for Few-shot Novel View Synthesis | FrameNeRF:一种简单高效的小样本新颖视图合成框架 | Yan Xing, Pan Wang, Ligang Liu, Daolun Li, Li Zhang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection | NeRF-Det++:将语义提示和透视感知深度监督结合起来进行室内多视图 3D 检测 | Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | TaylorGrid: Towards Fast and High-Quality Implicit Field Learning via Direct Taylor-based Grid Optimization | TaylorGrid:通过直接基于泰勒的网格优化实现快速、高质量的隐式场学习 | Renyi Mao, Qingshan Xu, Peng Zheng, Ye Wang, Tieru Wu, Rui Ma | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Mip-Grid: Anti-aliased Grid Representations for Neural Radiance Fields | Mip-Grid:神经辐射场的抗锯齿网格表示 | Seungtae Nam, Daniel Rho, Jong Hwan Ko, Eunbyung Park | arxiv.org/pdf/2402.14… | null |
3DGS
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | GaussianPro: 3D Gaussian Splatting with Progressive Propagation | GaussianPro:具有渐进传播的 3D 高斯泼溅 | Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen | arxiv.org/pdf/2402.14… | null |
模型压缩/优化
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth Estimation | TIE-KD:用于单目深度估计的独立于教师且可解释的知识蒸馏 | Sangwon Choi, Daejune Choi, Duksu Kim | arxiv.org/pdf/2402.14… | null |
分类/检测/识别/分割/...
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition | WeakSAM:分割任何东西满足弱监督实例级识别 | Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | A Transformer Model for Boundary Detection in Continuous Sign Language | 连续手语边界检测的 Transformer 模型 | Razieh Rastgoo, Kourosh Kiani, Sergio Escalera | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening | 用于增强宫颈异常筛查的两阶段细胞病理学图像合成 | Zhenrong Shen, Manman Fei, Xin Wang, Jiangdong Cai, Sheng Wang, Lichi Zhang, Qian Wang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | QIS : Interactive Segmentation via Quasi-Conformal Mappings | QIS:通过准共形映射进行交互式分割 | Han Zhang, Daoping Zhang, Lok Ming Lui | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Quadruplet Loss For Improving the Robustness to Face Morphing Attacks | 四联体损失提高面对变形攻击的鲁棒性 | Iurii Medvedev, Nuno Gonçalves | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation | 克服医学图像分割自监督对比学习中的维度崩溃 | Jamshid Hassanpour, Vinkle Srivastav, Didier Mutter, Nicolas Padoy | arxiv.org/pdf/2402.14… | null |
2024-02-22 | High-Speed Detector For Low-Powered Devices In Aerial Grasping | 适用于空中抓取低功耗设备的高速探测器 | Ashish Kumar, Laxmidhar Behera | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Deep vessel segmentation based on a new combination of vesselness filters | 基于血管过滤器新组合的深层血管分割 | Guillaume Garret, Antoine Vacavant, Carole Frindel | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition | 实现视觉地点识别预训练模型的无缝适应 | Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, Yaowei Wang, Chun Yuan | arxiv.org/pdf/2402.14… | link |
2024-02-22 | Reimagining Anomalies: What If Anomalies Were Normal? | 重新想象异常:如果异常是正常的怎么办? | Philipp Liznerski, Saurabh Varshneya, Ece Calikus, Sophie Fellenz, Marius Kloft | arxiv.org/pdf/2402.14… | null |
2024-02-22 | S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR | S^2Former-OR:用于 OR 中场景图生成的单级双峰变压器 | Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Heng | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks | 使用自适应图卷积网络建模 3D 婴儿动力学 | Daniel Holmberg, Manu Airaksinen, Viviana Marchi, Andrea Guzzetta, Anna Kivi, Leena Haataja, Sampsa Vanhatalo, Teemu Roos | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Semantic Image Synthesis with Unconditional Generator | 使用无条件生成器进行语义图像合成 | Jungwoo Chae, Hyunin Cho, Sooyeon Go, Kyungmook Choi, Youngjung Uh | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Reading Relevant Feature from Global Representation Memory for Visual Object Tracking | 从全局表示存储器中读取相关特征以进行视觉对象跟踪 | Xinyu Zhou, Pinxue Guo, Lingyi Hong, Jinglun Li, Wei Zhang, Weifeng Ge, Wenqiang Zhang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints | GAM-Depth:利用梯度感知掩模和语义约束的自监督室内深度估计 | Anqi Cheng, Zhiyuan Yang, Haiyue Zhu, Kezhi Mao | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Subobject-level Image Tokenization | 子对象级图像标记化 | Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung | arxiv.org/pdf/2402.14… | null |
2024-02-22 | YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5 | YOLO-TLA:基于YOLOv5的高效轻量小物体检测模型 | Peng Gao, Chun-Lin Ji, Tao Yu, Ru-Yue Yuan | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Secure Navigation using Landmark-based Localization in a GPS-denied Environment | 在 GPS 拒绝的环境中使用基于地标的定位进行安全导航 | Ganesh Sapkota, Sanjay Madria | arxiv.org/pdf/2402.14… | null |
2024-02-22 | A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets | 自监督压力图人体关键点检测方法:优化跨数据集的泛化和计算效率 | Chengzhang Yu, Xianjun Yang, Wenxia Bao, Shaonan Wang, Zhiming Yao | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer | 使用修补频谱图变压器的压缩鲁棒合成语音检测 | Amit Kumar Singh Yadav, Ziyue Xiang, Kratika Bhagtani, Paolo Bestagini, Stefano Tubaro, Edward J. Delp | arxiv.org/pdf/2402.14… | null |
2024-02-22 | HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention | 提示:具有掩模感知编码和增强注意力的高质量 INPainting Transformer | Shuang Chen, Amir Atapour-Abarghouei, Hubert P. H. Shum | arxiv.org/pdf/2402.14… | null |
Transformer
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | Snap Video:用于文本到视频合成的缩放时空转换器 | Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, et.al. | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot | Multi-HMR:单次多人全身人体网格恢复 | Fabien Baradel, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez, Thomas Lucas | arxiv.org/pdf/2402.14… | null |
2024-02-22 | CCPA: Long-term Person Re-Identification via Contrastive Clothing and Pose Augmentation | CCPA:通过对比服装和姿势增强进行长期人员重新识别 | Vuong D. Nguyen, Shishir K. Shah | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Learning to Kern -- Set-wise Estimation of Optimal Letter Space | 学习紧缩——最佳字母空间的集合估计 | Kei Nakatsuru, Seiichi Uchida | arxiv.org/pdf/2402.14… | null |
3D/CG
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | Distributed Radiance Fields for Edge Video Compression and Metaverse Integration in Autonomous Driving | 用于自动驾驶中边缘视频压缩和元宇宙集成的分布式辐射场 | Eugen Šlapak, Matúš Dopiriak, Mohammad Abdullah Al Faruque, Juraj Gazda, Marco Levorato | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Place Anything into Any Video | 将任何内容放入任何视频中 | Ziling Liu, Jinyu Yang, Mingqi Gao, Feng Zheng | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding | Swin3D++:用于 3D 室内场景理解的有效多源预训练 | Yu-Qi Yang, Yu-Xiao Guo, Yang Liu | arxiv.org/pdf/2402.14… | null |
各类学习方式
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | Self-supervised Visualisation of Medical Image Datasets | 医学图像数据集的自监督可视化 | Ifeoma Veronica Nwabufo, Jan Niklas Böhm, Philipp Berens, Dmitry Kobak | arxiv.org/pdf/2402.14… | null |
2024-02-22 | CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion | CLCE:一种改进交叉熵和对比学习以实现优化学习融合的方法 | Zijun Long, George Killick, Lipeng Zhuang, Gerardo Aragon-Camarasa, Zaiqiao Meng, Richard Mccreadie | arxiv.org/pdf/2402.14… | null |
其他
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-02-22 | Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging | 医学影像中专家级视觉语言基础模型的人口统计学偏差 | Yuzhe Yang, Yujia Liu, Xin Liu, Avanti Gulhane, Domenico Mastrodicasa, Wei Wu, Edward J Wang, Dushyant W Sahani, Shwetak Patel | arxiv.org/pdf/2402.14… | null |
2024-02-22 | GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion | GeneOH Diffusion:通过去噪扩散实现可推广的手-物体交互去噪 | Xueyi Liu, Li Yi | arxiv.org/pdf/2402.14… | null |
2024-02-22 | CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation | CyberDemo:增强模拟人体演示以实现现实世界的灵巧操作 | Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, Xiaolong Wang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | A Class of Topological Pseudodistances for Fast Comparison of Persistence Diagrams | 一类用于快速比较持久图的拓扑伪距离 | Rolando Kindelan Nuñez, Mircea Petrache, Mauricio Cerda, Nancy Hitschfeld | arxiv.org/pdf/2402.14… | null |
2024-02-22 | VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning | VLPose:通过语言视觉调整弥合姿势估计中的领域差距 | Jingyao Li, Pengguang Chen, Xuan Ju, Hong Xu, Jiaya Jia | arxiv.org/pdf/2402.14… | null |
2024-02-22 | HR-APR: APR-agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera Relocalisation | HR-APR:具有不确定性估计和相机重定位分层细化的 APR 不可知框架 | Changkun Liu, Shuai Chen, Yukun Zhao, Huajian Huang, Victor Prisacariu, Tristan Braud | arxiv.org/pdf/2402.14… | null |
2024-02-22 | An Error-Matching Exclusion Method for Accelerating Visual SLAM | 一种加速视觉SLAM的错误匹配排除方法 | Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Jinlong Yang, Tao Yan, Liangyi Huang, Mingfeng Wang | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Vision-Language Navigation with Embodied Intelligence: A Survey | 具身智能的视觉语言导航:一项调查 | Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuan | arxiv.org/pdf/2402.14… | null |
2024-02-22 | A Landmark-Aware Visual Navigation Dataset | 地标感知视觉导航数据集 | Faith Johnson, Bryan Bo Cao, Kristin Dana, Shubham Jain, Ashwin Ashok | arxiv.org/pdf/2402.14… | null |
2024-02-22 | Reconstruction-Based Anomaly Localization via Knowledge-Informed Self-Training | 通过基于知识的自我训练进行基于重建的异常定位 | Cheng Qian, Xiaoxian Lao, Chunguang Li | arxiv.org/pdf/2402.14… | null |