[分享][每日更新][2024.01.03][CV_arxiv_papers]

132 阅读7分钟

!UPDATED -- 2024-01-03

分类/检测/识别/分割

Publish DateTitleAuthorsPDFCode
2024-01-03FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene UnderstandingXingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li2401.01970v1null
2024-01-03Distilling Temporal Knowledge with Masked Feature Reconstruction for 3D Object DetectionHaowen Zheng, Dong Cao, Jintao Xu, Rui Ai, Weihao Gu, Yang Yang, Yanyan Liang2401.01918v1null
2024-01-03Detours for Navigating Instructional VideosKumar Ashutosh, Zihui Xue, Tushar Nagarajan, Kristen Grauman2401.01823v1null
2024-01-03Towards Robust Semantic Segmentation against Patch-based Attack via Attention RefinementZheng Yuan, Jie Zhang, Yude Wang, Shiguang Shan, Xilin Chen2401.01750v1null
2024-01-03Lightweight Adaptive Feature De-drifting for Compressed Image ClassificationLong Peng, Yang Cao, Yuejin Sun, Yang Wang2401.01724v1null
2024-01-03Local Adaptive Clustering Based Image Matching for Automatic Visual IdentificationZhizhen Wang2401.01720v1null
2024-01-03Modality Exchange Network for Retinogeniculate Visual Pathway SegmentationHua Han, Cheng Li, Lei Xie, Yuanjing Feng, Alou Diakite, Shanshan Wang2401.01685v1null
2024-01-03DiffYOLO: Object Detection for Anti-Noise via YOLO and Diffusion ModelsYichen Liu, Huajian Zhang, Daqing Gao2401.01659v1null
2024-01-03S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar ImageryQingyuan Yang, Guanzhou Chen, Xiaoliang Tan, Tong Wang, Jiaqi Wang, Xiaodong Zhang2401.01643v1null
2024-01-04BLADE: Box-Level Supervised Amodal Segmentation through Directed ExpansionZhaochen Liu, Zhixuan Li, Tingting Jiang2401.01642v2null
2024-01-03Context-Aware Interaction Network for RGB-T Semantic SegmentationYing Lv, Zhi Liu, Gongyang Li2401.01624v1link
2024-01-03MLIP: Medical Language-Image Pre-training with Masked Local Representation LearningJiarun Liu, Hong-Yu Zhou, Cheng Li, Weijian Huang, Hao Yang, Yong Liang, Shanshan Wang2401.01591v1null
2024-01-03Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient AccumulationXuannan Liu, Yaoyao Zhong, Weihong Deng, Hongzhi Shi, Xingchen Cui, Yunfeng Yin, Dongchao Wen2401.01575v1null
2024-01-03DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM with Joint Semantic EncodingMingrui Li, Jiaming He, Guangan Jiang, Hongyu Wang2401.01545v1null
2024-01-03LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-trainingRujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang2401.01522v1null
2024-01-03From Pixel to Slide image: Polarization Modality-based Pathological Diagnosis Using Representation LearningJia Dong, Yao Yao, Yang Dong, Hui Ma2401.01496v1null
2024-01-03Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object RecognitionKyle Buettner, Sina Malakouti, Xiang Lorraine Li, Adriana Kovashka2401.01482v1null

OCR

Publish DateTitleAuthorsPDFCode
2024-01-03WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScopeJun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, et.al.2401.01699v1null

Transformer

Publish DateTitleAuthorsPDFCode
2024-01-03Moonshot: Towards Controllable Video Generation and Editing with Multimodal ConditionsDavid Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo2401.01827v1link
2024-01-03VGA: Vision and Graph Fused Attention Network for Rumor DetectionLin Bai, Caiyan Jia, Ziying Song, Chaoqun Cui2401.01759v1null
2024-01-03FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision TransformersZheng Yuan, Jie Zhang, Shiguang Shan2401.01752v1null
2024-01-03STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment FusionWei Yao, Hongwen Zhang, Yunlian Sun, Jinhui Tang2401.01730v1null
2024-01-03Transformer RGBT Tracking with Spatio-Temporal Multimodal TokensDengdi Sun, Yajie Pan, Andong Lu, Chenglong Li, Bin Luo2401.01674v1null
2024-01-03Enhancing the medical foundation model with multi-scale and cross-modality feature learningWeijian Huang, Cheng Li, Hong-Yu Zhou, Jiarun Liu, Hao Yang, Yong Liang, Shanshan Wang2401.01583v1null
2024-01-03Context-Guided Spatio-Temporal Video GroundingXin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang2401.01578v1link
2024-01-03A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual Geo-LocalizationShishen Li, Cuiwei Liu, Huaijun Qiu, Zhaokui Li2401.01574v1null
2024-01-03AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image EnhancementKang Fu, Yicong Peng, Zicheng Zhang, Qihang Xu, Xiaohong Liu, Jia Wang, Guangtao Zhai2401.01569v1null
2024-01-03Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node MetastasisShichuan Zhang, Sunyi Zheng, Zhongyi Shui, Honglin Li, Lin Yang2401.01553v1null
2024-01-03CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution TransformersYi Rong, Haoran Zhou, Lixin Yuan, Cheng Mei, Jiahao Wang, Tong Lu2401.01552v1link
2024-01-03Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and OpportunitiesSenkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang2401.01544v1null
2024-01-03Glance and Focus: Memory Prompting for Multi-Event Video Question AnsweringZiyi Bai, Ruiping Wang, Xilin Chen2401.01529v1link
2024-01-03Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional SportsHaopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen2401.01505v1null
2024-01-03Token Propagation Controller for Efficient Vision TransformerWentao Zhu2401.01470v1null

模型压缩/优化

Publish DateTitleAuthorsPDFCode
2024-01-03From Audio to Photoreal Embodiment: Synthesizing Humans in ConversationsEvonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard2401.01885v1null
2024-01-03Retraining-free Model Quantization via One-Shot Weight-Coupling LearningChen Tang, Yuan Meng, Jiacheng Jiang, Shuzhao Xie, Rongwei Lu, Xinzhu Ma, Zhi Wang, Wenwu Zhu2401.01543v1null

生成模型

Publish DateTitleAuthorsPDFCode
2024-01-03Instruct-Imagen: Image Generation with Multi-modal InstructionHexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, et.al.2401.01952v1null
2024-01-03Can We Generate Realistic Hands Only Using Convolution?Mehran Hosseini, Peyman Hosseini2401.01951v1null
2024-01-04Few-shot Adaptation of Multi-modal Foundation Models: A SurveyFan Liu, Tianshu Zhang, Wenwen Dai, Wenwen Cai, Xiaocong Zhou, Delong Chen2401.01736v2null
2024-01-03SIGNeRF: Scene Integrated Generation for Neural Radiance FieldsJan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch2401.01647v1null
2024-01-03Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental LearningZitong Huang, Ze Chen, Zhixing Chen, Erjin Zhou, Xinxing Xu, Rick Siow Mong Goh, Yong Liu, Chunmei Feng, Wangmeng Zuo2401.01598v1link
2024-01-03S-DMs:Skip-Step Diffusion ModelsYixuan Wang, Shuangyin Li2401.01520v1link

多模态

Publish DateTitleAuthorsPDFCode
2024-01-04HawkRover: An Autonomous mmWave Vehicular Communication Testbed with Multi-sensor Fusion and Deep LearningEthan Zhu, Haijian Sun, Mingyue Ji2401.01822v2null
2024-01-03Prototypical Information Bottlenecking and Disentangling for Multimodal Cancer Survival PredictionYilan Zhang, Yingxue Xu, Jianqi Chen, Fengying Xie, Hao Chen2401.01646v1link
2024-01-03GPT-4V(ision) is a Generalist Web Agent, if GroundedBoyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su2401.01614v1link
2024-01-03Multimodal self-supervised learning for lesion localizationHao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Yong Liang, Shanshan Wang2401.01524v1null

Zero/Few-Shot Learning

Publish DateTitleAuthorsPDFCode
2024-01-03Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as ProgrammersAleksandar Stanić, Sergi Caelles, Michael Tschannen2401.01974v1null
2024-01-03Few-shot Image Generation via Information Transfer from the Built Geodesic SurfaceYuexing Han, Liheng Ruan, Bing Wang2401.01749v1null

半监督/无监督学习

Publish DateTitleAuthorsPDFCode
2024-01-03Unsupervised Object-Centric Learning from Multiple Unspecified ViewpointsJinyang Yuan, Tonglin Chen, Zhimeng Shen, Bin Li, Xiangyang Xue2401.01922v1null
2024-01-03Test-Time Personalization with Meta Prompt for Gaze EstimationHuan Liu, Julia Qi, Zhenhao Li, Mohammad Hassanpour, Yang Wang, Konstantinos Plataniotis, Yuanhao Yu2401.01577v1null
2024-01-03Boosting of Implicit Neural Representation-based Image DenoiserZipei Yan, Zhengji Liu, Jizhou Li2401.01548v1link

其他

Publish DateTitleAuthorsPDFCode
2024-01-03GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised LearningAarash Feizi, Randall Balestriero, Adriana Romero-Soriano, Reihaneh Rabbany2401.01990v1link
2024-01-03AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed and Low ToleranceJoao P. C. Bertoldo, Dick Ameln, Ashwin Vaidya, Samet Akçay2401.01984v1link
2024-01-03LEAP-VO: Long-term Effective Any Point Tracking for Visual OdometryWeirong Chen, Le Chen, Rui Wang, Marc Pollefeys2401.01887v1null
2024-01-03Step length measurement in the wild using FMCW radarParthipan Siva, Alexander Wong, Patricia Hewston, George Ioannidis, Dr. Jonathan Adachi, Dr. Alexander Rabinovich, Andrea Lee, Alexandra Papaioannou2401.01868v1null
2024-01-03A Vision Check-up for Language ModelsPratyusha Sharma, Tamar Rott Shaham, Manel Baradad, Stephanie Fu, Adrian Rodriguez-Munoz, Shivam Duggal, Phillip Isola, Antonio Torralba2401.01862v1null
2024-01-03Synthetic dataset of ID and Travel DocumentCarlos Boned, Maxime Talarmain, Nabil Ghanmi, Guillaume Chiron, Sanket Biswas, Ahmad Montaser Awal, Oriol Ramos Terrades2401.01858v1link
2024-01-04Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-IdentificationYulin Li, Tianzhu Zhang, Yongdong Zhang2401.01839v2null
2024-01-03aMUSEd: An Open MUSE ReproductionSuraj Patil, William Berman, Robin Rombach, Patrick von Platen2401.01808v1null
2024-01-03Learning Keypoints for Robotic Cloth Manipulation using Synthetic DataThomas Lips, Victor-Louis De Gusseme, Francis wyffels2401.01734v1null
2024-01-03Fact-checking based fake news detection: a reviewYuzhou Yang, Yangming Zhou, Qichao Ying, Zhenxing Qian, Dan Zeng, Liang Liu2401.01717v1null
2024-01-03AID-DTI: Accelerating High-fidelity Diffusion Tensor Imaging with Detail-Preserving Model-based Deep LearningWenxin Fan, Jian Cheng, Cheng Li, Xinrui Ma, Jing Yang, Juan Zou, Ruoyou Wu, Qiegen Liu, Shanshan Wang2401.01693v1null
2024-01-03ODTrack: Online Dense Temporal Token Learning for Visual TrackingYaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, Xianxian Li2401.01686v1link
2024-01-03Performance Evaluation of GPS Trajectory Rasterization MethodsNecip Enes Gengec, Ergin Tari2401.01676v1null
2024-01-03Simultaneous q-Space Sampling Optimization and Reconstruction for Fast and High-fidelity Diffusion Magnetic Resonance ImagingJing Yang, Jian Cheng, Cheng Li, Wenxin Fan, Juan Zou, Ruoyou Wu, Shanshan Wang2401.01662v1null
2024-01-03AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AIFanda Fan, Chunjie Luo, Jianfeng Zhan, Wanling Gao2401.01651v1null
2024-01-03De-Confusing Pseudo-Labels in Source-Free Domain AdaptationIdit Diamant, Idan Achituve, Arnon Netzer2401.01650v1null
2024-01-03Real-Time Human Fall Detection using a Lightweight Pose Estimation TechniqueEkram Alam, Abu Sufian, Paramartha Dutta, Marco Leo2401.01587v1null
2024-01-03View Distribution Alignment with Progressive Adversarial Learning for UAV Visual Geo-LocalizationCuiwei Liu, Jiahao Liu, Huaijun Qiu, Zhaokui Li, Xiangbin Shi2401.01573v1null
2024-01-03One-Step Late Fusion Multi-view Clustering with Compressed SubspaceQiyuan Ou, Pei Zhang, Sihang Zhou, En Zhu2401.01558v1null
2024-01-03DDPM based X-ray Image SynthesizerPraveen Mahaulpatha, Thulana Abeywardane, Tomson George2401.01539v1null
2024-01-03Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question AnsweringHaopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond2401.01510v1null