[分享][每日更新][2024.03.14][CV_arxiv_papers]

395 阅读21分钟

[UPDATED!] 2024-03-14 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint PriorSCP-Diff:具有空间分类联合先验的逼真语义图像合成Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhaoarxiv.org/pdf/2403.09…null
2024-03-14Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image全息重新照明:从单个图像进行可控体积肖像重新照明Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, Vishal M. Patelarxiv.org/pdf/2403.09…null
2024-03-143D-VLA: A 3D Vision-Language-Action Generative World Model3D-VLA:3D 视觉-语言-动作生成世界模型Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Ganarxiv.org/pdf/2403.09…null
2024-03-14Generalized Predictive Model for Autonomous Driving自动驾驶广义预测模型Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, et.al.arxiv.org/pdf/2403.09…null
2024-03-14Make-Your-3D: Fast and Consistent Subject-Driven 3D Content GenerationMake-Your-3D:快速、一致的主题驱动 3D 内容生成Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duanarxiv.org/pdf/2403.09…null
2024-03-14Score-Guided Diffusion for 3D Human Recovery用于 3D 人体恢复的评分引导扩散Anastasis Stathopoulos, Ligong Han, Dimitris Metaxasarxiv.org/pdf/2403.09…link
2024-03-14Explore In-Context Segmentation via Latent Diffusion Models通过潜在扩散模型探索上下文分割Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yanarxiv.org/pdf/2403.09…null
2024-03-14MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space ModelsMambaTalk:使用选择性状态空间模型进行高效的整体手势合成Zunnan Xu, Yukang Lin, Haonan Han, Sicheng Yang, Ronghui Li, Yachao Zhang, Xiu Liarxiv.org/pdf/2403.09…null
2024-03-14Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image EditingEta 反演:为基于扩散的真实图像编辑设计最佳 Eta 函数Wonjun Kang, Kevin Galim, Hyung Il Kooarxiv.org/pdf/2403.09…link
2024-03-143D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation3D-SceneDreamer:文本驱动的 3D 一致场景生成Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zouarxiv.org/pdf/2403.09…null
2024-03-14Mitigating attribute amplification in counterfactual image generation减轻反事实图像生成中的属性放大Tian Xia, Mélanie Roschewitz, Fabio De Sousa Ribeiro, Charles Jones, Ben Glockerarxiv.org/pdf/2403.09…null
2024-03-14Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction减轻稀疏视图 CT 重建级联扩散模型中数据一致性引起的差异Hanyu Chen, Zhixiu Hao, Lin Guo, Liying Xiaoarxiv.org/pdf/2403.09…null
2024-03-14StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology ImagesStainFuser:控制扩散以在数千兆像素组织学图像中实现更快的神经风格转移Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang Vuarxiv.org/pdf/2403.09…null
2024-03-14XReal: Realistic Anatomy and Pathology-Aware X-ray Generation via Controllable Diffusion ModelXReal:通过可控扩散模型生成逼真的解剖学和病理学 X 射线Anees Ur Rehman Hashmi, Ibrahim Almakky, Mohammad Areeb Qazi, Santosh Sanjeev, Vijay Ram Papineni, Dwarikanath Mahapatra, Mohammad Yaqubarxiv.org/pdf/2403.09…null
2024-03-14Noise Dimension of GAN: An Image Compression PerspectiveGAN 的噪声维度:图像压缩视角Ziran Zhu, Tongda Xu, Ling Li, Yan Wangarxiv.org/pdf/2403.09…null
2024-03-14Intention-driven Ego-to-Exo Video Generation意图驱动的 Ego-to-Exo 视频生成Hongchen Luo, Kai Zhu, Wei Zhai, Yang Caoarxiv.org/pdf/2403.09…null
2024-03-14Intention-aware Denoising Diffusion Model for Trajectory Prediction用于轨迹预测的意图感知去噪扩散模型Chen Liu, Shibo He, Haoyu Liu, Jiming Chenarxiv.org/pdf/2403.09…null
2024-03-14Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts开关扩散变压器:与稀疏专家混合协同去噪任务Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kimarxiv.org/pdf/2403.09…null
2024-03-14Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D PriorSculpt3D:使用稀疏 3D 先验的多视图一致文本到 3D 生成Cheng Chen, Xiaofeng Yang, Fan Yang, Chengzeng Feng, Zhoujie Fu, Chuan-Sheng Foo, Guosheng Lin, Fayao Liuarxiv.org/pdf/2403.09…null
2024-03-14Rethinking Referring Object Removal重新思考引用对象删除Xiangtian Xue, Jiasong Wu, Youyong Kong, Lotfi Senhadji, Huazhong Shuarxiv.org/pdf/2403.09…null
2024-03-14Virtual birefringence imaging and histological staining of amyloid deposits in label-free tissue using autofluorescence microscopy and deep learning使用自发荧光显微镜和深度学习对无标记组织中的淀粉样沉积物进行虚拟双折射成像和组织学染色Xilin Yang, Bijie Bai, Yijie Zhang, Musa Aydin, Sahan Yoruc Selcuk, Zhen Guo, Gregory A. Fishbein, Karine Atlan, William Dean Wallace, Nir Pillar, et.al.arxiv.org/pdf/2403.09…null
2024-03-14Dyadic Interaction Modeling for Social Behavior Generation社会行为生成的二元交互建模Minh Tran, Di Chang, Maksim Siniukov, Mohammad Soleymaniarxiv.org/pdf/2403.09…null
2024-03-14StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic ControlStreamMultiDiffusion:具有基于区域的语义控制的实时交互生成Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Leearxiv.org/pdf/2403.09…link

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14MM1: Methods, Analysis & Insights from Multimodal LLM Pre-trainingMM1:多模式法学硕士预培训的方法、分析和见解Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, et.al.arxiv.org/pdf/2403.09…null
2024-03-14Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation闭上眼睛,注意安全:通过图像到文本转换保护多模式法学硕士Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhangarxiv.org/pdf/2403.09…null
2024-03-14Less is More: Data Value Estimation for Visual Instruction Tuning少即是多:视觉指令调整的数据价值估计Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wenarxiv.org/pdf/2403.09…null
2024-03-14VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision UnderstandingVisionGPT-3D:用于增强 3D 视觉理解的通用多模态代理Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zouarxiv.org/pdf/2403.09…null
2024-03-14M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load AssessmentM&M:在认知负荷评估中集成视听线索的多模式多任务模型Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarroarxiv.org/pdf/2403.09…null
2024-03-14OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor EnvironmentsOpenGraph:大规模户外环境中的开放词汇分层 3D 图表示Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yuearxiv.org/pdf/2403.09…link
2024-03-14Unsupervised Modality-Transferable Video Highlight Detection with Representation Activation Sequence Learning具有表示激活序列学习的无监督模态可转移视频精彩片段检测Tingtian Li, Zixun Sun, Xinyu Xiaoarxiv.org/pdf/2403.09…null
2024-03-14Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks将路由功能引入具有低阶瓶颈的视觉语言参数高效微调Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moensarxiv.org/pdf/2403.09…null
2024-03-14AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-InstructionsAVIBench:评估对抗性视觉指令的大视觉语言模型的鲁棒性Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhangarxiv.org/pdf/2403.09…null
2024-03-14Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-ReferringGriffon v2:通过高分辨率缩放和视觉语言共同参考推进多模态感知Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wangarxiv.org/pdf/2403.09…link
2024-03-14EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion DetectionEfficientMFD:迈向更高效的多模态同步融合检测Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Geng Yang, Wenbo Huang, Yunsong Liarxiv.org/pdf/2403.09…null
2024-03-14SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival选择器:具有卷积屏蔽自动编码器的异构图网络,用于癌症生存的多模式稳健预测Liangrui Pan, Yijun Peng, Yan Li, Xiang Wang, Wenjuan Liu, Liwen Xu, Qingchun Liang, Shaoliang Pengarxiv.org/pdf/2403.09…link
2024-03-14Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering使用 OCR 模态扰动进行场景文本视觉问答的对抗训练Zhixuan Shen, Haonan Luo, Sijia Li, Tianrui Liarxiv.org/pdf/2403.09…null
2024-03-14PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of InterestPoIFusion:通过兴趣点融合进行多模态 3D 物体检测Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reidarxiv.org/pdf/2403.09…null
2024-03-14Are Vision Language Models Texture or Shape Biased and Can We Steer Them?视觉语言模型的纹理或形状有偏差吗?我们可以引导它们吗?Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca Lamm, Muhammad Jehanzeb Mirza, Margret Keuper, Janis Keuperarxiv.org/pdf/2403.09…null
2024-03-14UniCode: Learning a Unified Codebook for Multimodal Large Language ModelsUniCode:学习多模态大语言模型的统一码本Sipeng Zheng, Bohan Zhou, Yicheng Feng, Ye Wang, Zongqing Luarxiv.org/pdf/2403.09…null
2024-03-14Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery用于 3D 人体网格恢复的分布和深度感知变压器Jerrin Bright, Bavesh Balaji, Harish Prakash, Yuhao Chen, David A Clausi, John Zelekarxiv.org/pdf/2403.09…null
2024-03-14The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?首先知道:代币分布如何揭示大型视觉语言模型中的隐藏知识?Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gouldarxiv.org/pdf/2403.09…null
2024-03-14VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal FrameworkVisionGPT:使用通用多模态框架的视觉语言理解代理Chris Kelly, Luhui Hu, Bang Yang, Yu Tian, Deshun Yang, Cindy Yang, Zaoshan Huang, Zihao Li, Jiayin Hu, Yuexian Zouarxiv.org/pdf/2403.09…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic GraspingGaussianGrasper:用于开放词汇机器人抓取的 3D 语言高斯泼溅Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, et.al.arxiv.org/pdf/2403.09…null
2024-03-14The NeRFect Match: Exploring NeRF Features for Visual LocalizationNeRFect 匹配:探索视觉定位的 NeRF 功能Qunjie Zhou, Maxim Maximov, Or Litany, Laura Leal-Taixéarxiv.org/pdf/2403.09…null
2024-03-14RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban ScenesRoDUS:城市场景中静态和动态元素的鲁棒分解Thang-Anh-Quan Nguyen, Luis Roldão, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkouarxiv.org/pdf/2403.09…null
2024-03-14PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF PriorsPreSight:利用城市规模的 NeRF 先验增强自动驾驶汽车感知Tianyuan Yuan, Yucheng Mao, Jiawei Yang, Yicheng Liu, Yue Wang, Hang Zhaoarxiv.org/pdf/2403.09…null

3DGS

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting放宽 3D 高斯分布的精确初始化约束Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kimarxiv.org/pdf/2403.09…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike StreamsSpikeReveal:使用 Spike Streams 从真实模糊输入中解锁时间序列Kang Chen, Shiyan Chen, Jiyuan Zhang, Baoyue Zhang, Yajing Zheng, Tiejun Huang, Zhaofei Yuarxiv.org/pdf/2403.09…link
2024-03-14Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization具有元提示表示和实例对比优化的开放词汇对象检测Zhao Wang, Aoxue Li, Fengwei Zhou, Zhenguo Li, Qi Douarxiv.org/pdf/2403.09…null
2024-03-14Video Editing via Factorized Diffusion Distillation通过因式分解扩散蒸馏进行视频编辑Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigmanarxiv.org/pdf/2403.09…null
2024-03-14Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object DetectionYOLOX-ViT 中用于侧扫声纳目标检测的知识蒸馏Martin Aubard, László Antal, Ana Madureira, Erika Ábrahámarxiv.org/pdf/2403.09…link
2024-03-14Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models选择和提炼:选择性双师知识转移,用于视觉语言模型的持续学习Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wangarxiv.org/pdf/2403.09…null
2024-03-14SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times AccelerationSAM-Lightening:轻量级Segment Anything模型,具有Dilated Flash Attention,可实现30倍加速Yanfei Songa, Bangzheng Pua, Peng Wanga, Hongxu Jiang, Dong Donga, Yiqing Shenarxiv.org/pdf/2403.09…null
2024-03-14Generalized Relevance Learning Grassmann Quantization广义相关学习格拉斯曼量化M. Mohammadi, M. Babai, M. H. F. Wilkinsonarxiv.org/pdf/2403.09…null
2024-03-14VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place RecognitionVDNA-PR:使用通用数据集表示进行鲁棒顺序视觉位置识别Benjamin Ramtoula, Daniele De Martini, Matthew Gadd, Paul Newmanarxiv.org/pdf/2403.09…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D UnderstandingGroupContrast:用于 3D 理解的语义感知自监督表示学习Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jiaarxiv.org/pdf/2403.09…null
2024-03-14Transformers Get Stable: An End-to-End Signal Propagation Theory for Language ModelsTransformers 变得稳定:语言模型的端到端信号传播理论Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia, Jungho Jung, Harshith Goka, Haejun Leearxiv.org/pdf/2403.09…link
2024-03-14OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient TuningOneTracker:将视觉对象跟踪与基础模型和高效调整相结合Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, et.al.arxiv.org/pdf/2403.09…null
2024-03-14PosSAM: Panoptic Open-vocabulary Segment AnythingPosSAM:全景开放词汇分段任何内容Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikliarxiv.org/pdf/2403.09…null
2024-03-14Renovating Names in Open-Vocabulary Segmentation Benchmarks更新开放词汇分割基准中的名称Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geigerarxiv.org/pdf/2403.09…null
2024-03-14Cloud gap-filling with deep learning for improved grassland monitoring通过深度学习填补云空白以改善草原监测Iason Tsardanidis, Alkiviadis Koukos, Vasileios Sitokonstantinou, Thanassis Drivas, Charalampos Kontoesarxiv.org/pdf/2403.09…null
2024-03-14WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuityWeakSurg:使用时间等方差和语义连续性的弱监督手术器械分割Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhouarxiv.org/pdf/2403.09…null
2024-03-14Explorations in Texture Learning纹理学习的探索Blaine Hoak, Patrick McDanielarxiv.org/pdf/2403.09…null
2024-03-14SkateFormer: Skeletal-Temporal Transformer for Human Action RecognitionSkateFormer:用于人类动作识别的骨骼时间转换器Jeonghyeok Do, Munchurl Kimarxiv.org/pdf/2403.09…null
2024-03-14Don't Judge by the Look: A Motion Coherent Augmentation for Video Recognition不要以貌取人:视频识别的运动相干增强Yitian Zhang, Yue Bai, Huan Wang, Yizhou Wang, Yun Fuarxiv.org/pdf/2403.09…link
2024-03-14Faceptor: A Generalist Model for Face PerceptionFaceptor:面部感知的通用模型Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Dengarxiv.org/pdf/2403.09…link
2024-03-14Anomaly Detection by Adapting a pre-trained Vision Language Model通过采用预先训练的视觉语言模型进行异常检测Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Baiarxiv.org/pdf/2403.09…null
2024-03-14Efficient Transferability Assessment for Selection of Pre-trained Detectors用于选择预训练探测器的高效可转移性评估Zhao Wang, Aoxue Li, Zhenguo Li, Qi Douarxiv.org/pdf/2403.09…null
2024-03-14Region-based U-net for accelerated training and enhanced precision in deep brain segmentation基于区域的 U-net,用于加速训练并提高深部大脑分割的精度Mengyu Li, Magnus Magnusson, Thilo van Eimeren, Lotta M. Ellingsenarxiv.org/pdf/2403.09…null
2024-03-14XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context OptimizationXCoOp:通过概念引导上下文优化进行计算机辅助诊断的可解释即时学习Yequan Bie, Luyang Luo, Zhixuan Chen, Hao Chenarxiv.org/pdf/2403.09…null
2024-03-14ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain GeneralizationConDiSR:单域泛化的对比解开和风格正则化Aleksandr Matsun, Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqubarxiv.org/pdf/2403.09…null
2024-03-14GiT: Towards Generalist Vision Transformer through Universal Language InterfaceGiT:通过通用语言接口迈向通才视觉转换器Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wangarxiv.org/pdf/2403.09…link
2024-03-14Impact of Synthetic Images on Morphing Attack Detection Using a Siamese Network合成图像对使用连体网络的变形攻击检测的影响Juan Tapia, Christoph Buscharxiv.org/pdf/2403.09…null
2024-03-14DF4LCZ: A SAM-Empowered Data Fusion Framework for Scene-Level Local Climate Zone ClassificationDF4LCZ:用于场景级当地气候区分类的 SAM 授权数据融合框架Qianqian Wu, Xianping Ma, Jialu Sui, Man-On Punarxiv.org/pdf/2403.09…null
2024-03-14D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object DetectionD3T:独特的双域教师之字形跨越 RGB 热间隙,用于域自适应目标检测Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan Cho, Wonjun Hwangarxiv.org/pdf/2403.09…link
2024-03-14SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking ScenariosSD-Net:分箱场景中 6D 姿态估计的对称感知关键点预测和域适应Ding-Tao Huang, En-Te Lin, Lipeng Chen, Li-Fu Liu, Long Zengarxiv.org/pdf/2403.09…link
2024-03-14Semi- and Weakly-Supervised Learning for Mammogram Mass Segmentation with Limited Annotations具有有限注释的乳房X线照片质量分割的半监督和弱监督学习Xinyu Xiong, Churan Wang, Wenxue Li, Guanbin Liarxiv.org/pdf/2403.09…null
2024-03-14Annotation Free Semantic Segmentation with Vision Foundation Models使用视觉基础模型进行无注释语义分割Soroush Seifi, Daniel Olmeda Reino, Fabien Despinoy, Rahaf Aljundiarxiv.org/pdf/2403.09…null
2024-03-14Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical Perspective从理论角度重新思考用于医疗异常检测的自动编码器Yu Cai, Hao Chen, Kwang-Ting Chengarxiv.org/pdf/2403.09…null
2024-03-14Anatomical Structure-Guided Medical Vision-Language Pre-training解剖结构引导医学视觉语言预训练Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wangarxiv.org/pdf/2403.09…null
2024-03-14CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise ClassificationCLIP-EBC:CLIP通过增强的分块分类可以准确计数Yiming Ma, Victor Sanchez, Tanaya Guhaarxiv.org/pdf/2403.09…null
2024-03-14EventRPG: Event Data Augmentation with Relevance Propagation GuidanceEventRPG:具有相关性传播指导的事件数据增强Mingyuan Sun, Donghao Zhang, Zongyuan Ge, Jiaxu Wang, Jia Li, Zheng Fang, Renjing Xuarxiv.org/pdf/2403.09…link
2024-03-14Advanced Tumor Segmentation in Medical Imaging: An Ensemble Approach for BraTS 2023 Adult Glioma and Pediatric Tumor Tasks医学影像中的高级肿瘤分割:BraTS 2023 成人胶质瘤和儿童肿瘤任务的整体方法Fadillah Maani, Anees Ur Rehman Hashmi, Mariam Aljuboory, Numan Saeed, Ikboljon Sobirov, Mohammad Yaqubarxiv.org/pdf/2403.09…null
2024-03-14WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide imagesWSI-SAM:用于组织病理学全切片图像的多分辨率分段任意模型 (SAM)Hong Liu, Haosen Yang, Paul J. van Diest, Josien P. W. Pluim, Mitko Vetaarxiv.org/pdf/2403.09…null
2024-03-14D-YOLO a robust framework for object detection in adverse weather conditionsD-YOLO 是恶劣天气条件下目标检测的强大框架Zihan Chuarxiv.org/pdf/2403.09…null
2024-03-14Improving Distant 3D Object Detection Using 2D Box Supervision使用 2D 框监督改进远程 3D 物体检测Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarezarxiv.org/pdf/2403.09…null
2024-03-14Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation通过实例分割的即时学习定制分割基础模型Hyung-Il Kim, Kimin Yun, Jun-Seok Yun, Yuseok Baearxiv.org/pdf/2403.09…null
2024-03-14SHAN: Object-Level Privacy Detection via Inference on Scene Heterogeneous GraphSHAN:通过场景异构图推理进行对象级隐私检测Zhuohang Jiang, Bingkui Tong, Xia Du, Ahmed Alhammadi, Jizhe Zhouarxiv.org/pdf/2403.09…null
2024-03-14VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image SegmentationVM-UNET-V2 重新思考用于医学图像分割的 Vision Mamba UNetMingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Taoarxiv.org/pdf/2403.09…link
2024-03-14Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation生物物理学为脑肿瘤分割提供病理学正则化Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Riveroarxiv.org/pdf/2403.09…null
2024-03-14Single Domain Generalization for Crowd Counting人群计数的单域泛化Zhuoxuan Peng, S. -H. Gary Chanarxiv.org/pdf/2403.09…link
2024-03-14Randomized Principal Component Analysis for Hyperspectral Image Classification高光谱图像分类的随机主成分分析Mustafa Ustunerarxiv.org/pdf/2403.09…null
2024-03-14CardioCaps: Attention-based Capsule Network for Class-Imbalanced Echocardiogram ClassificationCardioCaps:基于注意力的胶囊网络,用于类别不平衡的超声心动图分类Hyunkyung Han, Jihyeon Seong, Jaesik Choiarxiv.org/pdf/2403.09…link
2024-03-14When Semantic Segmentation Meets Frequency Aliasing当语义分割遇到频率混叠时Linwei Chen, Lin Gu, Ying Fuarxiv.org/pdf/2403.09…link
2024-03-14TBI Image/Text (TBI-IT): Comprehensive Text and Image Datasets for Traumatic Brain Injury ResearchTBI 图像/文本 (TBI-IT):用于创伤性脑损伤研究的综合文本和图像数据集Jie Li, Jiaying Wen, Tongxin Yang, Fenglin Cai, Miao Wei, Zhiwei Zhang, Li Jiangarxiv.org/pdf/2403.09…null
2024-03-14Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines利用基础模型自动数据增强策略和骨骼点进行工业装配线中的手部动作识别Liang Wu, X. -G. Maarxiv.org/pdf/2403.09…null
2024-03-14Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier长尾分类器的梯度感知 Logit 调整损失Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hongarxiv.org/pdf/2403.09…link

图像理解

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14Perspective-Equivariant Imaging: an Unsupervised Framework for Multispectral Pansharpening透视等变成像:多光谱全色锐化的无监督框架Andrew Wang, Mike Daviesarxiv.org/pdf/2403.09…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14Video Mamba Suite: State Space Model as a Versatile Alternative for Video UnderstandingVideo Mamba Suite:状态空间模型作为视频理解的多功能替代方案Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wangarxiv.org/pdf/2403.09…link
2024-03-14LocalMamba: Visual State Space Model with Windowed Selective ScanLocalMamba:具有窗口选择性扫描的视觉状态空间模型Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xuarxiv.org/pdf/2403.09…link
2024-03-14PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task AdaptationPYRA:并行产出重新激活,用于训练推理高效任务适应Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Dingarxiv.org/pdf/2403.09…null
2024-03-14S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor ClusteringS^2MVTC:简单而高效的可扩展多视图张量聚类Zhen Long, Qiyuan Wang, Yazhou Ren, Yipeng Liu, Ce Zhuarxiv.org/pdf/2403.09…link
2024-03-14Desigen: A Pipeline for Controllable Design Template GenerationDesigen:可控设计模板生成流程Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chenarxiv.org/pdf/2403.09…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text RenderingGlyph-ByT5:用于精确视觉文本渲染的定制文本编码器Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuanarxiv.org/pdf/2403.09…null
2024-03-14Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting通过人员匹配和无监督 2D-3D 提升来改进实时全向 3D 多人人体姿势估计Pawel Knap, Peter Hardy, Alberto Tamajo, Hwasup Lim, Hansung Kimarxiv.org/pdf/2403.09…null
2024-03-14Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians使用弹簧质量 3D 高斯模型重建和模拟弹性物体Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Liarxiv.org/pdf/2403.09…null
2024-03-14Hyper-3DG: Text-to-3D Gaussian Generation via HypergraphHyper-3DG:通过 Hypergraph 生成文本到 3D 高斯Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gaoarxiv.org/pdf/2403.09…null
2024-03-14CLOAF: CoLlisiOn-Aware Human FlowCLOAF:碰撞感知人流Andrey Davydov, Martin Engilberge, Mathieu Salzmann, Pascal Fuaarxiv.org/pdf/2403.09…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14Counterfactual contrastive learning: robust representations via causal image synthesis反事实对比学习:通过因果图像合成实现稳健表示Melanie Roschewitz, Fabio De Sousa Ribeiro, Tian Xia, Galvin Khara, Ben Glockerarxiv.org/pdf/2403.09…link
2024-03-14Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure哨兵引导的零样本学习:无需真实数据暴露的协作范式Fan Wan, Xingyu Miao, Haoran Duan, Jingjing Deng, Rui Gao, Yang Longarxiv.org/pdf/2403.09…null
2024-03-14Metadata-Driven Federated Learning of Connectional Brain Templates in Non-IID Multi-Domain Scenarios非独立同分布多域场景中元数据驱动的连接脑模板联邦学习Geng Chen, Qingyue Wang, Islem Rekikarxiv.org/pdf/2403.09…null
2024-03-14Hyperparameters in Continual Learning: a Reality Check持续学习中的超参数:现实检验Sungmin Cha, Kyunghyun Choarxiv.org/pdf/2403.09…null

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-14What Sketch Explainability Really Means for Downstream Tasks草图可解释性对于下游任务真正意味着什么Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Tao Xiang, Yi-Zhe Songarxiv.org/pdf/2403.09…null
2024-03-14Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation通过时间入射光调制进行基于事件的异步 HDR 成像Yuliang Wu, Ganchao Tan, Jinze Chen, Wei Zhai, Yang Cao, Zheng-Jun Zhaarxiv.org/pdf/2403.09…null
2024-03-14SketchINR: A First Look into Sketches as Implicit Neural RepresentationsSketchINR:初探作为隐式神经表征的草图Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Songarxiv.org/pdf/2403.09…null
2024-03-14Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement软化防御:通过自我引导标签细化实现对抗稳健性Daiwei Yu, Zhuorong Li, Lina Wei, Canghong Jin, Yun Zhang, Sixian Chanarxiv.org/pdf/2403.09…null
2024-03-14Deep unfolding Network for Hyperspectral Image Super-Resolution with Automatic Exposure Correction具有自动曝光校正功能的高光谱图像超分辨率深度展开网络Yuan Fang, Yipeng Liu, Jie Chen, Zhen Long, Ao Li, Chong-Yung Chi, Ce Zhuarxiv.org/pdf/2403.09…null
2024-03-14Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains驯服异构数据域联合原型学习中的跨域表示方差Lei Wang, Jieming Bian, Letian Zhang, Chen Chen, Jie Xuarxiv.org/pdf/2403.09…null
2024-03-14rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial ContoursrFaceNet:通过特定身份的面部轮廓增强生理信号提取的端到端网络Dali Zhu, Wenli Zhang, Hualin Zeng, Xiaohao Liu, Long Yang, Jiaqi Zhengarxiv.org/pdf/2403.09…null
2024-03-14Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset使用 WebSight 数据集解锁 Web 屏幕截图到 HTML 代码的转换Hugo Laurençon, Léo Tronchon, Victor Sanharxiv.org/pdf/2403.09…null