!UPDATED -- 2024-01-04
分类/检测/识别/分割
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | ODIN: A Single Model for 2D and 3D Perception | Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki | 2401.02416v1 | null |
| 2024-01-04 | What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs | Alex Trevithick, Matthew Chan, Towaki Takikawa, Umar Iqbal, Shalini De Mello, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano | 2401.02411v1 | null |
| 2024-01-04 | 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation | Zihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, et.al. | 2401.02402v1 | null |
| 2024-01-04 | ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation | Xinyang Pu, Hecheng Jia, Linghao Zheng, Feng Wang, Feng Xu | 2401.02326v1 | link |
| 2024-01-04 | BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model | Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma | 2401.02317v1 | link |
| 2024-01-04 | ShapeAug: Occlusion Augmentation for Event Camera Data | Katharina Bendig, René Schuster, Didier Stricker | 2401.02274v1 | null |
| 2024-01-04 | Slot-guided Volumetric Object Radiance Fields | Di Qi, Tong Yang, Xiangyu Zhang | 2401.02241v1 | null |
| 2024-01-04 | Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification | Yukang Zhang, Yang Lu, Yan Yan, Hanzi Wang, Xuelong Li | 2401.02162v1 | null |
| 2024-01-04 | Marginal Debiased Network for Fair Visual Recognition | Mei Wang, Weihong Deng, Sen Su | 2401.02150v1 | null |
| 2024-01-04 | Explore Human Parsing Modality for Action Recognition | Jinfu Liu, Runwei Ding, Yuhang Wen, Nan Dai, Fanyang Meng, Shen Zhao, Mengyuan Liu | 2401.02138v1 | link |
| 2024-01-04 | SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment | Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo | 2401.02137v1 | null |
| 2024-01-04 | Source-Free Online Domain Adaptive Semantic Segmentation of Satellite Images under Image Degradation | Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo | 2401.02113v1 | null |
| 2024-01-04 | CLAPP: Contrastive Language-Audio Pre-training in Passive Underwater Vessel Classification | Zeyu Li, Jingsheng Gao, Tong Yu, Suncheng Xiang, Jiacheng Ruan, Ting Liu, Yuzhuo Fu | 2401.02099v1 | null |
| 2024-01-04 | Leveraging SAM for Single-Source Domain Generalization in Medical Image Segmentation | Hanhui Wang, Huaize Ye, Yi Xia, Xueyan Zhang | 2401.02076v1 | link |
| 2024-01-04 | Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket | Zhaokun Zhou, Kaiwei Che, Wei Fang, Keyu Tian, Yuesheng Zhu, Shuicheng Yan, Yonghong Tian, Li Yuan | 2401.02020v1 | link |
Transformer
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | LLM Augmented LLMs: Expanding Capabilities through Composition | Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar | 2401.02412v1 | null |
| 2024-01-04 | A novel method to enhance pneumonia detection via a model-level ensembling of CNN and vision transformer | Sandeep Angara, Nishith Reddy Mannuru, Aashrith Mannuru, Sharath Thirunagaru | 2401.02358v1 | null |
| 2024-01-04 | TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection | Hao Sun, Mingyao Zhou, Wenjing Chen, Wei Xie | 2401.02309v1 | null |
| 2024-01-04 | GridFormer: Point-Grid Transformer for Surface Reconstruction | Shengtao Li, Ge Gao, Yudong Liu, Yu-Shen Liu, Ming Gu | 2401.02292v1 | link |
| 2024-01-04 | Prompt Decoupling for Text-to-Image Person Re-identification | Weihao Li, Lei Tan, Pingyang Dai, Yan Zhang | 2401.02173v1 | null |
| 2024-01-04 | Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study | Ziqiang Zheng, Yiwei Chen, Jipeng Zhang, Tuan-Anh Vu, Huimin Zeng, Yue Him Wong Tim, Sai-Kit Yeung | 2401.02147v1 | null |
| 2024-01-04 | Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance | Jiacheng Wang, Ping Liu, Wei Xu | 2401.02126v1 | link |
| 2024-01-04 | Federated Class-Incremental Learning with Prototype Guided Transformer | Haiyang Guo, Fei Zhu, Wenzhuo Liu, Xu-Yao Zhang, Cheng-Lin Liu | 2401.02094v1 | null |
模型压缩/优化
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | Distillation-based fabric anomaly detection | Simon Thomine, Hichem Snoussi | 2401.02287v1 | link |
生成模型
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | Bring Metric Functions into Diffusion Models | Jie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo | 2401.02414v1 | null |
| 2024-01-04 | Nodule detection and generation on chest X-rays: NODE21 Challenge | Ecem Sogancioglu, Bram van Ginneken, Finn Behrendt, Marcel Bengs, Alexander Schlaefer, Miron Radu, Di Xu, Ke Sheng, Fabien Scalzo, Eric Marcus, et.al. | 2401.02192v1 | null |
| 2024-01-04 | GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation | Xuehao Gao, Yang Yang, Zhenyu Xie, Shaoyi Du, Zhongqian Sun, Yang Wu | 2401.02142v1 | null |
| 2024-01-04 | Preserving Image Properties Through Initializations in Diffusion Models | Jeffrey Zhang, Shao-Yu Chang, Kedan Li, David Forsyth | 2401.02097v1 | null |
| 2024-01-04 | DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection | Yunfan Ye, Kai Xu, Yuhang Huang, Renjiao Yi, Zhiping Cai | 2401.02032v1 | link |
| 2024-01-04 | Improving Diffusion-Based Image Synthesis with Context Prediction | Ling Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin Cui | 2401.02015v1 | null |
多模态
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning | Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo | 2401.02384v1 | null |
| 2024-01-04 | LLaVA- | Yichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang | 2401.02330v1 | null |
| 2024-01-04 | Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry | Xinzhe Luo, Xin Wang, Linda Shapiro, Chun Yuan, Jianfeng Feng, Xiahai Zhuang | 2401.02141v1 | null |
Zero/Few-Shot Learning
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | Learning to Prompt with Text Only Supervision for Vision-Language Models | Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc Van Gool, Federico Tombari | 2401.02418v1 | link |
| 2024-01-04 | Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training | Longtian Qiu, Shan Ning, Xuming He | 2401.02347v1 | link |
3D相关
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | Learning the 3D Fauna of the Web | Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu | 2401.02400v1 | null |
| 2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa, Alain Tremeau, Damien Muselet, Philippe Colantoni | 2401.02383v1 | null |
| 2024-01-04 | Fit-NGP: Fitting Object Models to Neural Graphics Primitives | Marwan Taher, Ignacio Alzugaray, Andrew J. Davison | 2401.02357v1 | null |
| 2024-01-04 | PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation | Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae | 2401.02281v1 | null |
其他
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-04 | An Open and Comprehensive Pipeline for Unified Object Grounding and Detection | Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang | 2401.02361v1 | link |
| 2024-01-04 | Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection | Yabin Wang, Zhiwu Huang, Zhiheng Ma, Xiaopeng Hong | 2401.02335v1 | null |
| 2024-01-04 | SuperEdge: Towards a Generalization Model for Self-Supervised Edge Detection | Leng Kai, Zhang Zhijie, Liu Jie, Zed Boukhers, Sui Wei, Cong Yang, Li Zhijun | 2401.02313v1 | null |
| 2024-01-04 | Lightweight Fish Classification Model for Sustainable Marine Management: Indonesian Case | Febrian Kurniawan, Gandeva Bayu Satrya, Firuz Kamalov | 2401.02278v1 | null |
| 2024-01-04 | Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain | Xuanhua He, Tao Hu, Guoli Wang, Zejin Wang, Run Wang, Qian Zhang, Keyu Yan, Ziyi Chen, Rui Li, Chenjun Xie, et.al. | 2401.02161v1 | null |
| 2024-01-04 | Frequency-Adaptive Pan-Sharpening with Mixture of Experts | Xuanhua He, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou | 2401.02151v1 | null |
| 2024-01-04 | Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation | Zipeng Fu, Tony Z. Zhao, Chelsea Finn | 2401.02117v1 | null |
| 2024-01-04 | Significance of Anatomical Constraints in Virtual Try-On | Debapriya Roy, Sanchayan Santra, Diganta Mukherjee, Bhabatosh Chanda | 2401.02110v1 | null |
| 2024-01-04 | Generalizable vision-language pre-training for annotation-free pathology localization | Hao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Shanshan Wang | 2401.02044v1 | null |
| 2024-01-04 | Efficient Cloud-edge Collaborative Inference for Object Re-identification | Chuanming Wang, Yuxin Yang, Mengshi Qi, Huadong Ma | 2401.02041v1 | null |
| 2024-01-04 | Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack | Ruofei Wang, Renjie Wan, Zongyu Guo, Qing Guo, Rui Huang | 2401.02031v1 | null |