[UPDATED!] 2024-01-31 (Publish Time)
分类/检测/识别/分割
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Improved Scene Landmark Detection for Camera Localization | 改进了相机定位的场景地标检测 | Tien Do, Sudipta N. Sinha | arxiv.org/pdf/2401.18… | null |
| 2024-01-31 | Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition | 基于骨架的动作识别的连续图学习的基准敏感性 | Wei Wei, Tom De Schepper, Kevin Mets | arxiv.org/pdf/2401.18… | null |
| 2024-01-31 | Multilinear Operator Networks | 多线性算子网络 | Yixin Cheng, Grigorios G. Chrysos, Markos Georgopoulos, Volkan Cevher | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Shrub of a thousand faces: an individual segmentation from satellite images using deep learning | 千面灌木:使用深度学习对卫星图像进行单独分割 | Rohaifa Khaldi, Siham Tabik, Sergio Puertas-Ruiz, Julio Peñas de Giles, José Antonio Hódar Correa, Regino Zamora, Domingo Alcaraz Segura | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study | 使用视觉检测模型增强多模态大语言模型:实证研究 | Qirui Jiao, Daoyuan Chen, Yilun Huang, Yaliang Li, Ying Shen | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | MelNet: A Real-Time Deep Learning Algorithm for Object Detection | MelNet:一种用于目标检测的实时深度学习算法 | Yashar Azadvatan, Murat Kurt | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | HyperZ | HyperZ | Harvie Zhang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Source-free Domain Adaptive Object Detection in Remote Sensing Images | 遥感图像中的无源域自适应目标检测 | Weixing Liu, Jun Liu, Xin Su, Han Nie, Bin Luo | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Hi-SAM:结合 Segment Anything 模型进行分层文本分割 | Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | PVLR: Prompt-driven Visual-Linguistic Representation Learning for Multi-Label Image Recognition | PVLR:用于多标签图像识别的提示驱动的视觉语言表示学习 | Hao Tan, Zichang Tan, Jun Li, Jun Wan, Zhen Lei | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error | AEROBLADE:使用自动编码器重建误差对潜在扩散图像进行免训练检测 | Jonas Ricker, Denis Lukovnikov, Asja Fischer | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | VR-based generation of photorealistic synthetic data for training hand-object tracking models | 基于 VR 生成逼真的合成数据,用于训练手部物体跟踪模型 | Chengyan Zhang, Rahul Chaudhari | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model | 卷积遇见 LoRA:分段任意模型的参数高效微调 | Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Semantic Anything in 3D Gaussians | 3D 高斯中的任何语义 | Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Instruction-Guided Scene Text Recognition | 指令引导的场景文本识别 | Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, Yu-Gang Jiang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation | 利用 Swin Transformer 进行本地到全局弱监督语义分割 | Rozhan Ahmadi, Shohreh Kasaei | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Do Object Detection Localization Errors Affect Human Performance and Trust? | 对象检测定位错误会影响人类表现和信任吗? | Sven de Witte, Ombretta Strafforello, Jan van Gemert | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | SimAda: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes | SimAda:一个简单的统一框架,用于在表现不佳的场景中调整分段任意模型 | Yiran Song, Qianyu Zhou, Xuequan Lu, Zhiwen Shao, Lizhuang Ma | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Tiered approach for rapid damage characterisation of infrastructure enabled by remote sensing and deep learning technologies | 利用遥感和深度学习技术快速表征基础设施损坏的分层方法 | Nadiia Kopiika, Andreas Karavias, Pavlos Krassakis, Zehao Ye, Jelena Ninic, Nataliya Shakhovska, Nikolaos Koukouzas, Sotirios Argyroudis, Stergios-Aristoteles Mitoulis | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement | 利用人机交互提高计算机视觉数据集质量 | Esla Timothy Anzaku, Hyesoo Hong, Jin-Woo Park, Wonjun Yang, Kangmin Kim, JongBum Won, Deshika Vinoshani Kumari Herath, Arnout Van Messem, Wesley De Neve | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Unified Physical-Digital Face Attack Detection | 统一的物理-数字人脸攻击检测 | Hao Fang, Ajian Liu, Haocheng Yuan, Junze Zheng, Dingheng Zeng, Yanhong Liu, Jiankang Deng, Sergio Escalera, Xiaoming Liu, Jun Wan, et.al. | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Datacube segmentation via Deep Spectral Clustering | 通过深度谱聚类进行数据立方分割 | Alessandro Bombini, Fernando García-Avello Bofías, Caterina Bracci, Michele Ginolfi, Chiara Ruberto | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | All Beings Are Equal in Open Set Recognition | 开集认识中众生平等 | Chaohua Li, Enhao Zhang, Chuanxing Geng, SongCan Chen | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition | 用于提示语音识别的计算和参数高效的多模态融合变压器 | Lei Liu, Li Liu, Haizhou Li | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | 擅长字幕,不擅长计数:在地球观测数据上对 GPT-4V 进行基准测试 | Chenhui Zhang, Sherrie Wang | arxiv.org/pdf/2401.17… | link |
| 2024-01-31 | Head and Neck Tumor Segmentation from [18F]F-FDG PET/CT Images Based on 3D Diffusion Model | 基于 3D 扩散模型的 [18F]F-FDG PET/CT 图像的头颈肿瘤分割 | Yafei Dong, Kuang Gong | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Local Feature Matching Using Deep Learning: A Survey | 使用深度学习进行局部特征匹配:调查 | Shibiao Xu, Shunpeng Chen, Rongtao Xu, Changwei Wang, Peng Lu, Li Guo | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Towards Image Semantics and Syntax Sequence Learning | 迈向图像语义和句法序列学习 | Chun Tao, Timur Ibrayev, Kaushik Roy | arxiv.org/pdf/2401.17… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs | 用于 FPGA 上深度学习加速的可训练定点量化 | Dingyi Dai, Yichi Zhang, Jiahao Zhang, Zhanqiu Hu, Yaohui Cai, Qi Sun, Zhiru Zhang | arxiv.org/pdf/2401.17… | null |
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators | 运动引导:使用可微运动估计器进行基于扩散的图像编辑 | Daniel Geng, Andrew Owens | arxiv.org/pdf/2401.18… | null |
| 2024-01-31 | CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting | CARFF:用于 3D 场景预测的条件自动编码辐射场 | Jiezhi Yang, Khushi Desai, Charles Packer, Harshil Bhatia, Nicholas Rhinehart, Rowan McAllister, Joseph Gonzalez | arxiv.org/pdf/2401.18… | null |
| 2024-01-31 | Advances in 3D Generation: A Survey | 3D 生成的进展:调查 | Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, Ying Shan | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Double InfoGAN for Contrastive Analysis | 双InfoGAN进行对比分析 | Florence Carton, Robin Louiset, Pietro Gori | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | 3D-Plotting Algorithm for Insects using YOLOv5 | 使用 YOLOv5 的昆虫 3D 绘图算法 | Daisuke Mori, Hiroki Hayami, Yasufumi Fujimoto, Isao Goto | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation | Image Anything:迈向推理连贯且免训练的多模态图像生成 | Yuanhuiyi Lyu, Xu Zheng, Lin Wang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models | 基于扩散模型的图像空间和频率感知恢复方法 | Kyungsung Lee, Donggyu Lee, Myungjoo Kang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Topology-Aware Latent Diffusion for 3D Shape Generation | 用于生成 3D 形状的拓扑感知潜在扩散 | Jiangbei Hu, Ben Fei, Baixin Xu, Fei Hou, Weidong Yang, Shengfa Wang, Na Lei, Chen Qian, Ying He | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Task-Oriented Diffusion Model Compression | 面向任务的扩散模型压缩 | Geonung Kim, Beomsu Kim, Eunhyeok Park, Sunghyun Cho | arxiv.org/pdf/2401.17… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Binding Touch to Everything: Learning Unified Multimodal Tactile Representations | 将触摸与一切结合起来:学习统一的多模态触觉表征 | Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, et.al. | arxiv.org/pdf/2401.18… | null |
| 2024-01-31 | Controllable Dense Captioner with Multimodal Embedding Bridging | 具有多模式嵌入桥接的可控密集字幕器 | Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Qixiang Ye, Fang Wan | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis | 邻近 QA:释放多模态大型语言模型的力量进行空间邻近分析 | Jianing Li, Xi Nan, Ming Lu, Li Du, Shanghang Zhang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval | M2-RAAP:一种多模式方法,用于推进基于适应的预训练,实现有效且高效的零样本视频文本检索 | Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks | SNP-S3:各种视频文本任务的共享网络预训练和显着语义强化 | Xingning Dong, Qingpei Guo, Tian Gan, Qing Wang, Jianlong Wu, Xiangyuan Ren, Yuan Cheng, Wei Chu | arxiv.org/pdf/2401.17… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification | DROP:将重新识别和人体解析与特定于任务的特征分离以进行被遮挡人员重新识别 | Shuguang Dou, Xiangyang Jiang, Yuanpeng Tu, Junyao Gao, Zefan Qu, Qingsong Zhao, Cairong Zhao | arxiv.org/pdf/2401.18… | null |
| 2024-01-31 | LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement | LaneGraph2Seq:通过点边编码和连接增强使用语言模型提取车道拓扑 | Renyuan Peng, Xinyue Cai, Hang Xu, Jiachen Lu, Feng Wen, Wei Zhang, Li Zhang | arxiv.org/pdf/2401.17… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields | ReplaceAnything3D:使用组合神经辐射场进行文本引导的 3D 场景编辑 | Edward Bartrum, Thu Nguyen-Phuoc, Chris Xie, Zhengqin Li, Numair Khan, Armen Avetisyan, Douglas Lanman, Lei Xiao | arxiv.org/pdf/2401.17… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Fine-Grained Zero-Shot Learning: Advances, Challenges, and Prospects | 细粒度零样本学习:进展、挑战和前景 | Jingcai Guo, Zhijie Rao, Song Guo, Jingren Zhou, Dacheng Tao | arxiv.org/pdf/2401.17… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Exploring the Common Appearance-Boundary Adaptation for Nighttime Optical Flow | 探索夜间光流的常见外观边界适应 | Hanyu Zhou, Yi Chang, Haoyue Liu, Wending Yan, Yuxing Duan, Zhiwei Shi, Luxin Yan | arxiv.org/pdf/2401.17… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-01-31 | Reimagining Reality: A Comprehensive Survey of Video Inpainting Techniques | 重新想象现实:视频修复技术的综合调查 | Shreyank N Gowda, Yash Thakre, Shashank Narayana Gowda, Xiaobo Jin | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | RADIN: Souping on a Budget | RADIN:预算中的汤 | Thibaut Menes, Olivier Risser-Maroix | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Robustly overfitting latents for flexible neural image compression | 鲁棒地过度拟合潜在的灵活神经图像压缩 | Yura Perugachi-Diaz, Arwin Gansekoele, Sandjai Bhulai | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | COMET: Contrastive Mean Teacher for Online Source-Free Universal Domain Adaptation | COMET:在线无源通用域适应的对比平均老师 | Pascal Schlachter, Bin Yang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking | 揭示多视角多人关联和跟踪的自我监督力量 | Wei Feng, Feifan Wang, Ruize Han, Zekun Qian, Song Wang | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion | 敏捷但安全:学习无碰撞高速腿式运动 | Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, Guanya Shi | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Is Registering Raw Tagged-MR Enough for Strain Estimation in the Era of Deep Learning? | 注册Raw Tagged-MR足以用于深度学习时代的应变估计吗? | Zhangxing Bian, Ahmed Alshareef, Shuwen Wei, Junyu Chen, Yuli Wang, Jonghye Woo, Dzung L. Pham, Jiachen Zhuo, Aaron Carass, Jerry L. Prince | arxiv.org/pdf/2401.17… | null |
| 2024-01-31 | Data-Effective Learning: A Comprehensive Medical Benchmark | 数据有效学习:综合医学基准 | Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan | arxiv.org/pdf/2401.17… | null |