[分享][每日更新][2024.01.04][CV_arxiv_papers]

378 阅读5分钟

!UPDATED -- 2024-01-04

分类/检测/识别/分割

Publish DateTitleAuthorsPDFCode
2024-01-04ODIN: A Single Model for 2D and 3D PerceptionAyush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki2401.02416v1null
2024-01-04What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANsAlex Trevithick, Matthew Chan, Towaki Takikawa, Umar Iqbal, Shalini De Mello, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano2401.02411v1null
2024-01-043D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language DistillationZihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, et.al.2401.02402v1null
2024-01-04ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic SegmentationXinyang Pu, Hecheng Jia, Linghao Zheng, Feng Wang, Feng Xu2401.02326v1link
2024-01-04BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything ModelYiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma2401.02317v1link
2024-01-04ShapeAug: Occlusion Augmentation for Event Camera DataKatharina Bendig, René Schuster, Didier Stricker2401.02274v1null
2024-01-04Slot-guided Volumetric Object Radiance FieldsDi Qi, Tong Yang, Xiangyu Zhang2401.02241v1null
2024-01-04Frequency Domain Nuances Mining for Visible-Infrared Person Re-identificationYukang Zhang, Yang Lu, Yan Yan, Hanzi Wang, Xuelong Li2401.02162v1null
2024-01-04Marginal Debiased Network for Fair Visual RecognitionMei Wang, Weihong Deng, Sen Su2401.02150v1null
2024-01-04Explore Human Parsing Modality for Action RecognitionJinfu Liu, Runwei Ding, Yuhang Wen, Nan Dai, Fanyang Meng, Shen Zhao, Mengyuan Liu2401.02138v1link
2024-01-04SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal AlignmentZiping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo2401.02137v1null
2024-01-04Source-Free Online Domain Adaptive Semantic Segmentation of Satellite Images under Image DegradationFahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo2401.02113v1null
2024-01-04CLAPP: Contrastive Language-Audio Pre-training in Passive Underwater Vessel ClassificationZeyu Li, Jingsheng Gao, Tong Yu, Suncheng Xiang, Jiacheng Ruan, Ting Liu, Yuzhuo Fu2401.02099v1null
2024-01-04Leveraging SAM for Single-Source Domain Generalization in Medical Image SegmentationHanhui Wang, Huaize Ye, Yi Xia, Xueyan Zhang2401.02076v1link
2024-01-04Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN TicketZhaokun Zhou, Kaiwei Che, Wei Fang, Keyu Tian, Yuesheng Zhu, Shuicheng Yan, Yonghong Tian, Li Yuan2401.02020v1link

Transformer

Publish DateTitleAuthorsPDFCode
2024-01-04LLM Augmented LLMs: Expanding Capabilities through CompositionRachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar2401.02412v1null
2024-01-04A novel method to enhance pneumonia detection via a model-level ensembling of CNN and vision transformerSandeep Angara, Nishith Reddy Mannuru, Aashrith Mannuru, Sharath Thirunagaru2401.02358v1null
2024-01-04TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionHao Sun, Mingyao Zhou, Wenjing Chen, Wei Xie2401.02309v1null
2024-01-04GridFormer: Point-Grid Transformer for Surface ReconstructionShengtao Li, Ge Gao, Yudong Liu, Yu-Shen Liu, Ming Gu2401.02292v1link
2024-01-04Prompt Decoupling for Text-to-Image Person Re-identificationWeihao Li, Lei Tan, Pingyang Dai, Yan Zhang2401.02173v1null
2024-01-04Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case StudyZiqiang Zheng, Yiwei Chen, Jipeng Zhang, Tuan-Anh Vu, Huimin Zeng, Yue Him Wong Tim, Sai-Kit Yeung2401.02147v1null
2024-01-04Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image GuidanceJiacheng Wang, Ping Liu, Wei Xu2401.02126v1link
2024-01-04Federated Class-Incremental Learning with Prototype Guided TransformerHaiyang Guo, Fei Zhu, Wenzhuo Liu, Xu-Yao Zhang, Cheng-Lin Liu2401.02094v1null

模型压缩/优化

Publish DateTitleAuthorsPDFCode
2024-01-04Distillation-based fabric anomaly detectionSimon Thomine, Hichem Snoussi2401.02287v1link

生成模型

Publish DateTitleAuthorsPDFCode
2024-01-04Bring Metric Functions into Diffusion ModelsJie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo2401.02414v1null
2024-01-04Nodule detection and generation on chest X-rays: NODE21 ChallengeEcem Sogancioglu, Bram van Ginneken, Finn Behrendt, Marcel Bengs, Alexander Schlaefer, Miron Radu, Di Xu, Ke Sheng, Fabien Scalzo, Eric Marcus, et.al.2401.02192v1null
2024-01-04GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion GenerationXuehao Gao, Yang Yang, Zhenyu Xie, Shaoyi Du, Zhongqian Sun, Yang Wu2401.02142v1null
2024-01-04Preserving Image Properties Through Initializations in Diffusion ModelsJeffrey Zhang, Shao-Yu Chang, Kedan Li, David Forsyth2401.02097v1null
2024-01-04DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge DetectionYunfan Ye, Kai Xu, Yuhang Huang, Renjiao Yi, Zhiping Cai2401.02032v1link
2024-01-04Improving Diffusion-Based Image Synthesis with Context PredictionLing Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin Cui2401.02015v1null

多模态

Publish DateTitleAuthorsPDFCode
2024-01-04ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction TuningFanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo2401.02384v1null
2024-01-04LLaVA-: Efficient Multi-Modal Assistant with Small Language ModelYichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang2401.02330v1null
2024-01-04Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and GeometryXinzhe Luo, Xin Wang, Linda Shapiro, Chun Yuan, Jianfeng Feng, Xiahai Zhuang2401.02141v1null

Zero/Few-Shot Learning

Publish DateTitleAuthorsPDFCode
2024-01-04Learning to Prompt with Text Only Supervision for Vision-Language ModelsMuhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc Van Gool, Federico Tombari2401.02418v1link
2024-01-04Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only TrainingLongtian Qiu, Shan Ning, Xuming He2401.02347v1link

3D相关

Publish DateTitleAuthorsPDFCode
2024-01-04Learning the 3D Fauna of the WebZizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu2401.02400v1null
2024-01-04Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance ApplicationsDarshan Venkatrayappa, Alain Tremeau, Damien Muselet, Philippe Colantoni2401.02383v1null
2024-01-04Fit-NGP: Fitting Object Models to Neural Graphics PrimitivesMarwan Taher, Ignacio Alzugaray, Andrew J. Davison2401.02357v1null
2024-01-04PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset GenerationLukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae2401.02281v1null

其他

Publish DateTitleAuthorsPDFCode
2024-01-04An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionXiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang2401.02361v1link
2024-01-04Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake DetectionYabin Wang, Zhiwu Huang, Zhiheng Ma, Xiaopeng Hong2401.02335v1null
2024-01-04SuperEdge: Towards a Generalization Model for Self-Supervised Edge DetectionLeng Kai, Zhang Zhijie, Liu Jie, Zed Boukhers, Sui Wei, Cong Yang, Li Zhijun2401.02313v1null
2024-01-04Lightweight Fish Classification Model for Sustainable Marine Management: Indonesian CaseFebrian Kurniawan, Gandeva Bayu Satrya, Firuz Kamalov2401.02278v1null
2024-01-04Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier DomainXuanhua He, Tao Hu, Guoli Wang, Zejin Wang, Run Wang, Qian Zhang, Keyu Yan, Ziyi Chen, Rui Li, Chenjun Xie, et.al.2401.02161v1null
2024-01-04Frequency-Adaptive Pan-Sharpening with Mixture of ExpertsXuanhua He, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou2401.02151v1null
2024-01-04Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body TeleoperationZipeng Fu, Tony Z. Zhao, Chelsea Finn2401.02117v1null
2024-01-04Significance of Anatomical Constraints in Virtual Try-OnDebapriya Roy, Sanchayan Santra, Diganta Mukherjee, Bhabatosh Chanda2401.02110v1null
2024-01-04Generalizable vision-language pre-training for annotation-free pathology localizationHao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Shanshan Wang2401.02044v1null
2024-01-04Efficient Cloud-edge Collaborative Inference for Object Re-identificationChuanming Wang, Yuxin Yang, Mengshi Qi, Huadong Ma2401.02041v1null
2024-01-04Spy-Watermark: Robust Invisible Watermarking for Backdoor AttackRuofei Wang, Renjie Wan, Zongyu Guo, Qing Guo, Rui Huang2401.02031v1null