[UPDATED!] 2024-02-19 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | FiT: Flexible Vision Transformer for Diffusion Model | FiT:用于扩散模型的灵活视觉变压器 | Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Mixed Gaussian Flow for Diverse Trajectory Prediction | 用于多种轨迹预测的混合高斯流 | Jiahe Chen, Jinkun Cao, Dahua Lin, Kris Kitani, Jiangmiao Pang | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling | AnyGPT:具有离散序列建模的统一多模态法学硕士 | Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, et.al. | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training | 对抗性特征对齐:通过对抗性训练平衡深度学习的鲁棒性和准确性 | Leo Hyun Park, Jaeuk Kim, Myung Gyo Oh, Jaewoo Park, Taekyoung Kwon | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | 3D Vascular Segmentation Supervised by 2D Annotation of Maximum Intensity Projection | 由最大强度投影的 2D 注释监督的 3D 血管分割 | Zhanqiang Guo, Zimeng Tan, Jianjiang Feng, Jie Zhou | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Human Video Translation via Query Warping | 通过查询变形进行人类视频翻译 | Haiming Zhu, Yangyang Xu, Shengfeng He | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Direct Consistency Optimization for Compositional Text-to-Image Personalization | 组合文本到图像个性化的直接一致性优化 | Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models | 潜在扩散模型的隐私保护低阶适应 | Zihao Luo, Xilie Xu, Feng Liu, Yun Sing Koh, Di Wang, Jingfeng Zhang | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation | DiLightNet:用于基于扩散的图像生成的细粒度照明控制 | Chong Zeng, Yue Dong, Pieter Peers, Youkang Kong, Hongzhi Wu, Xin Tong | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation | One2Avatar:用于小样本用户适应的生成隐式头部头像 | Zhixuan Yu, Ziqian Bai, Abhimitra Meka, Feitong Tan, Qiangeng Xu, Rohit Pandey, Sean Fanello, Hyun Soo Park, Yinda Zhang | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | NOTE: Notable generation Of patient Text summaries through Efficient approach based on direct preference optimization | 注:通过基于直接偏好优化的有效方法生成显着的患者文本摘要 | Imjin Ahn, Hansle Gwon, Young-Hak Kim, Tae Joon Jun, Sanghyun Park | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image | ComFusion:从单个图像在多个特定场景中生成个性化主题 | Yan Hong, Jianfu Zhang | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models | UnlearnCanvas:用于对扩散模型的机器遗忘进行基准测试的程式化图像数据集 | Yihua Zhang, Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jiancheng Liu, Xiaoming Liu, Sijia Liu | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection | WildFake:用于人工智能生成图像检测的大规模挑战性数据集 | Yan Hong, Jianfu Zhang | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Statistical Test for Generated Hypotheses by Diffusion Models | 通过扩散模型生成的假设的统计检验 | Teruyuki Katsuoka, Tomohiro Shiraishi, Daiki Miwa, Vo Nguyen Le Duy, Ichiro Takeuchi | arxiv.org/pdf/2402.11… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models | 鲁棒 CLIP:鲁棒大型视觉语言模型的视觉嵌入的无监督对抗性微调 | Christian Schlarmann, Naman Deep Singh, Francesco Croce, Matthias Hein | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning | ChartX 和 ChartVLM:复杂图表推理的多功能基准和基础模型 | Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, et.al. | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | LVCHAT: Facilitating Long Video Comprehension | LVCHAT:促进长视频理解 | Yu Wang, Zeyuan Zhang, Julian McAuley, Zexue He | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models | 脚手架坐标促进大型多模态模型中的视觉语言协调 | Xuanyu Lei, Zonghan Yang, Xinrui Chen, Peng Li, Yang Liu | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Semantic Textual Similarity Assessment in Chest X-ray Reports Using a Domain-Specific Cosine-Based Metric | 使用特定领域的基于余弦的度量对胸部 X 射线报告进行语义文本相似性评估 | Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios | 揭开深度:应对挑战性场景的多模态融合框架 | Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng, Xiangyang Ji | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | MM-SurvNet: Deep Learning-Based Survival Risk Stratification in Breast Cancer Through Multimodal Data Fusion | MM-SurvNet:通过多模态数据融合进行基于深度学习的乳腺癌生存风险分层 | Raktim Kumar Mondol, Ewan K. A. Millar, Arcot Sowmya, Erik Meijering | arxiv.org/pdf/2402.11… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based View Synthesis | 二元不透明度网格:捕获精细的几何细节以进行基于网格的视图合成 | Christian Reiser, Stephan Garbin, Pratul P. Srinivasan, Dor Verbin, Richard Szeliski, Ben Mildenhall, Jonathan T. Barron, Peter Hedman, Andreas Geiger | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Colorizing Monochromatic Radiance Fields | 对单色辐射场进行着色 | Yean Cheng, Renjie Wan, Shuchen Weng, Chengxuan Zhu, Yakun Chang, Boxin Shi | arxiv.org/pdf/2402.12… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Interpretable Embedding for Ad-hoc Video Search | 用于临时视频搜索的可解释嵌入 | Jiaxin Wu, Chong-Wah Ngo | arxiv.org/pdf/2402.11… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Landmark Stereo Dataset for Landmark Recognition and Moving Node Localization in a Non-GPS Battlefield Environment | 用于非 GPS 战场环境中地标识别和移动节点定位的地标立体数据集 | Ganesh Sapkota, Sanjay Madria | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking | UncertaintyTrack:利用多目标跟踪中的检测和定位不确定性 | Chang Won Lee, Steven L. Waslander | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Zero shot VLMs for hate meme detection: Are we there yet? | 用于仇恨模因检测的零样本 VLM:我们到了吗? | Naquee Rizwan, Paramananda Bhaskar, Mithun Das, Swadhin Satyaprakash Majhi, Punyajoy Saha, Animesh Mukherjee | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers | 使用双向交叉注意力变压器感知更长的序列 | Markus Hiller, Krista A. Ehinger, Tom Drummond | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Towards Explainable LiDAR Point Cloud Semantic Segmentation via Gradient Based Target Localization | 通过基于梯度的目标定位实现可解释的激光雷达点云语义分割 | Abhishek Kuriyal, Vaibhav Kumar | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | ISCUTE: Instance Segmentation of Cables Using Text Embedding | ISCUTE:使用文本嵌入对电缆进行实例分割 | Shir Kozlovsky, Omkar Joglekar, Dotan Di Castro | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling | 具有可微分 ROI 建议网络和软 ROI 池化的胸部 X 光弱监督对象检测 | Philip Müller, Felix Meissen, Georgios Kaissis, Daniel Rueckert | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Event-Based Motion Magnification | 基于事件的运动放大 | Yutian Chen, Shi Guo, Fangzheng Yu, Feng Zhang, Jinwei Gu, Tianfan Xue | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Separating common from salient patterns with Contrastive Representation Learning | 通过对比表征学习区分常见模式和显着模式 | Robin Louiset, Edouard Duchesnay, Antoine Grigis, Pietro Gori | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Modularized Networks for Few-shot Hateful Meme Detection | 用于少量仇恨模因检测的模块化网络 | Rui Cao, Roy Ka-Wei Lee, Jing Jiang | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Rock Classification Based on Residual Networks | 基于残差网络的岩石分类 | Sining Zhoubian, Yuyang Wang, Zhihuan Jiang | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | SDGE: Stereo Guided Depth Estimation for 360° Camera Sets | SDGE:360° 相机组的立体引导深度估计 | Jialei Xu, Xianming Liu, Junjun Jiang, Xiangyang Ji | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | FOD-Swin-Net: angular super resolution of fiber orientation distribution using a transformer-based deep model | FOD-Swin-Net:使用基于变压器的深度模型的纤维取向分布的角度超分辨率 | Mateus Oliveira da Silva, Caio Pinheiro Santana, Diedre Santos do Carmo, Letícia Rittner | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation | 强化学习作为预测级联的简约替代方案:图像分割的案例研究 | Bharat Srikishan, Anika Tabassum, Srikanth Allu, Ramakrishnan Kannan, Nikhil Muralidhar | arxiv.org/pdf/2402.11… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Evaluating Image Review Ability of Vision Language Models | 评估视觉语言模型的图像审查能力 | Shigeki Saito, Kazuki Hayashi, Yusuke Ide, Yusuke Sakai, Kazuma Onishi, Toma Suzuki, Seiji Gobara, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models | 内窥镜凿子:术中成像雕刻 3D 解剖模型 | Jan Emily Mangulabnan, Roger D. Soberanis-Mukul, Timo Teufel, Manish Sahu, Jose L. Porras, S. Swaroop Vedula, Masaru Ishii, Gregory Hager, Russell H. Taylor, Mathias Unberath | arxiv.org/pdf/2402.11… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships | Open3DSG:来自点云的开放词汇 3D 场景图,具有可查询对象和开放集关系 | Sebastian Koch, Narunas Vaskevicius, Mirco Colosi, Pedro Hermosilla, Timo Ropinski | arxiv.org/pdf/2402.12… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | A Lightweight Parallel Framework for Blind Image Quality Assessment | 一种轻量级并行盲图像质量评估框架 | Qunyue Huang, Bin Fang | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Surround-View Fisheye Optics in Computer Vision and Simulation: Survey and Challenge | 计算机视觉和仿真中的环视鱼眼光学器件:调查和挑战 | Daniel Jakab, Brian Michael Deegan, Sushil Sharma, Eoin Martino Grua, Jonathan Horgan, Enda Ward, Pepijn Van De Ven, Anthony Scanlan, Ciaran Eising | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization | AICAtack:基于注意力优化的对抗性图像字幕攻击 | Jiyao Li, Mingze Ni, Yifei Dong, Tianqing Zhu, Wei Liu | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | PhySU-Net: Long Temporal Context Transformer for rPPG with Self-Supervised Pre-training | PhySU-Net:具有自监督预训练的 rPPG 长时态上下文转换器 | Marko Savic, Guoying Zhao | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | Language-guided Image Reflection Separation | 语言引导的图像反射分离 | Haofeng Zhong, Yuchen Hong, Shuchen Weng, Jinxiu Liang, Boxin Shi | arxiv.org/pdf/2402.11… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability | 以容量和可扩展性推动 3D 形状生成的自回归模型 | Xuelin Qian, Yu Wang, Simian Luo, Yinda Zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Xiangyang Xue, Bo Zhao, Tiejun Huang, et.al. | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Pan-Mamba: Effective pan-sharpening with State Space Model | Pan-Mamba:使用状态空间模型进行有效的全色锐化 | Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | A Spatiotemporal Illumination Model for 3D Image Fusion in Optical Coherence Tomography | 光学相干断层扫描中 3D 图像融合的时空照明模型 | Stefan Ploner, Jungeun Won, Julia Schottenhamml, Jessica Girgis, Kenneth Lam, Nadia Waheed, James Fujimoto, Andreas Maier | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Two Online Map Matching Algorithms Based on Analytic Hierarchy Process and Fuzzy Logic | 两种基于层次分析法和模糊逻辑的在线地图匹配算法 | Jeremy J. Lin, Tomoro Mochida, Riley C. W. O'Neill, Atsuro Yoshida, Masashi Yamazaki, Akinobu Sasada | arxiv.org/pdf/2402.11… | null |
| 2024-02-19 | DIO: Dataset of 3D Mesh Models of Indoor Objects for Robotics and Computer Vision Applications | DIO:用于机器人和计算机视觉应用的室内物体 3D 网格模型数据集 | Nillan Nimal, Wenbin Li, Ronald Clark, Sajad Saeedi | arxiv.org/pdf/2402.11… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Avoiding Feature Suppression in Contrastive Learning: Learning What Has Not Been Learned Before | 避免对比学习中的特征抑制:学习以前没有学过的东西 | Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi | arxiv.org/pdf/2402.11… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-02-19 | Regularization by denoising: Bayesian model and Langevin-within-split Gibbs sampling | 通过去噪进行正则化:贝叶斯模型和 Langevin-within-split Gibbs 采样 | Elhadji C. Faye, Mame Diarra Fall, Nicolas Dobigeon | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | DriveVLM:自动驾驶和大型视觉语言模型的融合 | Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Revisiting Data Augmentation in Deep Reinforcement Learning | 重新审视深度强化学习中的数据增强 | Jianshu Hu, Yunpeng Jiang, Paul Weng | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Examining Monitoring System: Detecting Abnormal Behavior In Online Examinations | 考试监控系统:检测在线考试中的异常行为 | Dinh An Ngo, Thanh Dat Nguyen, Thi Le Chi Dang, Huy Hoan Le, Ton Bao Ho, Vo Thanh Khang Nguyen, Truong Thanh Hung Nguyen | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Major TOM: Expandable Datasets for Earth Observation | 主要 TOM:可扩展的地球观测数据集 | Alistair Francis, Mikolaj Czerkawski | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | Robustness and Exploration of Variational and Machine Learning Approaches to Inverse Problems: An Overview | 反问题变分和机器学习方法的鲁棒性和探索:概述 | Alexander Auras, Kanchana Vaishnavi Gandikota, Hannah Droege, Michael Moeller | arxiv.org/pdf/2402.12… | null |
| 2024-02-19 | InMD-X: Large Language Models for Internal Medicine Doctors | InMD-X:内科医生的大型语言模型 | Hansle Gwon, Imjin Ahn, Hyoje Jung, Byeolhee Kim, Young-Hak Kim, Tae Joon Jun | arxiv.org/pdf/2402.11… | null |