!UPDATED -- 2024-01-05
分类/检测/识别/分割
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Reversing the Irreversible: A Survey on Inverse Biometrics | Marta Gomez-Barrero, Javier Galbally | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Multi-Stage Contrastive Regression for Action Quality Assessment | Qi An, Mengshi Qi, Huadong Ma | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | CrisisViT: A Robust Vision Transformer for Crisis Image Classification | Zijun Long, Richard McCreadie, Muhammad Imran | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Detection and Classification of Diabetic Retinopathy using Deep Learning Algorithms for Segmentation to Facilitate Referral Recommendation for Test and Treatment Prediction | Manoj S H, Arya A Bosale | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Systematic review of image segmentation using complex networks | Amin Rezaei, Fatemeh Asadi | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Complementary Information Mutual Learning for Multimodality Medical Image Segmentation | Chuyun Shen, Wenhao Li, Haoqing Chen, Xiaoling Wang, Fengping Zhu, Yuxin Li, Xiangfeng Wang, Bo Jin | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework for Multi-Modal 3D Object Detection | Ziying Song, Guoxin Zhang, Jun Xie, Lin Liu, Caiyan Jia, Shaoqing Xu, Zhepeng Wang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | PAHD: Perception-Action based Human Decision Making using Explainable Graph Neural Networks on SAR Images | Sasindu Wijeratne, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Benchmarking PathCLIP for Pathology Image Analysis | Sunyi Zheng, Xiaonan Cui, Yuxuan Sun, Jingxiong Li, Honglin Li, Yunlong Zhang, Pingyi Chen, Xueping Jing, Zhaoxiang Ye, Lin Yang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | MOODv2: Masked Image Modeling for Out-of-Distribution Detection | Jingyao Li, Pengguang Chen, Shaozuo Yu, Shu Liu, Jiaya Jia | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | DHGCN: Dynamic Hop Graph Convolution Network for Self-supervised Point Cloud Learning | Jincen Jiang, Lizhi Zhao, Xuequan Lu, Wei Hu, Imran Razzak, Meili Wang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Object-oriented backdoor attack against image captioning | Meiling Li, Nan Zhong, Xinpeng Zhang, Zhenxing Qian, Sheng Li | arxiv.org/abs/2401.02… | null |
Transformer
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | Denoising Vision Transformers | Jiawei Yang, Katie Z Luo, Jiefeng Li, Kilian Q Weinberger, Yonglong Tian, Yue Wang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | SPFormer: Enhancing Vision Transformer with Superpixel Representation | Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Generating Non-Stationary Textures using Self-Rectification | Yang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Two-stage Progressive Residual Dense Attention Network for Image Denoising | Wencong Wu, An Ge, Guannan Lv, Yuelong Xia, Yungang Zhang, Wen Xiong | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Diffbody: Diffusion-based Pose and Shape Editing of Human Images | Yuta Okuyama, Yuki Endo, Yoshihiro Kanamori | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing | Hugo Chan-To-Hing, Bharadwaj Veeravalli | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning | Alfirsa Damasyifa Fauzulhaq, Wahyu Parwitayasa, Joseph Ananda Sugihdharma, M. Fadli Ridhani, Novanto Yudistira | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss | Yatharth Gupta, Vishnu V. Jaddipal, Harish Prabhala, Sayak Paul, Patrick Von Platen | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | GTA: Guided Transfer of Spatial Attention from Object-Centric Representations | SeokHyun Seo, Jinwoo Hong, JungWoo Chae, Kyungyul Kim, Sangheum Hwang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification | Huy Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | A Random Ensemble of Encrypted models for Enhancing Robustness against Adversarial Examples | Ryota Iijima, Sayaka Shiota, Hitoshi Kiya | arxiv.org/abs/2401.02… | null |
生成模型
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction | Yuxin Yang, Pengfei Zhu, Mengshi Qi, Huadong Ma | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF | Hao Zhang, Yu-Wing Tai, Chi-Keung Tang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment | Yongxu Liu, Yinghui Quan, Guoyao Xiao, Aobo Li, Jinjian Wu | arxiv.org/abs/2401.02… | null |
多模态
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | Renjie Pi, Tianyang Han, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Object-Centric Instruction Augmentation for Robotic Manipulation | Junjie Wen, Yichen Zhu, Minjie Zhu, Jinming Li, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, et.al. | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues | David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma, Carlos-David Martínez-Hinarejos, Paolo Rosso | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Exploiting Polarized Material Cues for Robust Car Detection | Wen Dong, Haiyang Mei, Ziqi Wei, Ao Jin, Sen Qiu, Qiang Zhang, Xin Yang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Daoan Zhang, Junming Yang, Hanjia Lyu, Zijian Jin, Yuan Yao, Mingkai Chen, Jiebo Luo | arxiv.org/abs/2401.02… | null |
Zero/Few-Shot Learning
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model | Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu | arxiv.org/abs/2401.02… | null |
半监督/无监督学习
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos | Ryo Fujii, Ryo Hachiuma, Hideo Saito | arxiv.org/abs/2401.02… | null |
3D相关
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | Locally Adaptive Neural 3D Morphable Models | Michail Tarasiou, Rolandos Alexandros Potamias, Eimear O'Sullivan, Stylianos Ploumpis, Stefanos Zafeiriou | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNN | Saurabh Atreya, Maheswar Bora, Aritra Mukherjee, Abhijit Das | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Recent Advancement in 3D Biometrics using Monocular Camera | Aritra Mukherjee, Abhijit Das | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Partition-based Nonrigid Registration for 3D Face Model | Yuping Ye, Zhan Song, Juan Zhao | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting | Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White | arxiv.org/abs/2401.02… | null |
其他
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-05 | CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras | Yabin Zhu, Xiao Wang, Chenglong Li, Bo Jiang, Lin Zhu, Zhixiang Huang, Yonghong Tian, Jin Tang | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Subjective and Objective Analysis of Indian Social Media Video Quality | Sandeep Mishra, Mukul Jha, Alan C. Bovik | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Enhancing targeted transferability via feature space fine-tuning | Hui Zeng, Biwei Chen, Anjie Peng | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Predicting Traffic Flow with Federated Learning and Graph Neural with Asynchronous Computations Network | Muhammad Yaqub, Shahzad Ahmad, Malik Abdul Manan, Imran Shabir Chuhan | arxiv.org/abs/2401.02… | null |
| 2024-01-05 | Learning Image Demoireing from Unpaired Real Data | Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Fei Chao, Rongrong Ji | arxiv.org/abs/2401.02… | null |