[UPDATED!] 2024-03-27 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion | ObjectDrop:引导反事实以实现逼真的对象删除和插入 | Daniel Winter, Matan Cohen, Shlomi Fruchter, Yael Pritch, Alex Rav-Acha, Yedid Hoshen | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation | ECoDepth:有效调节单眼深度估计的扩散模型 | Suraj Patni, Aradhye Agarwal, Chetan Arora | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | 通过扩散特征的聚合进行物体姿态估计 | Tianfu Wang, Guosheng Hu, Hongguang Wang | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | ImageNet-D:扩散合成对象上神经网络鲁棒性的基准测试 | Chenshuang Zhang, Fei Pan, Junmo Kim, In So Kweon, Chengzhi Mao | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Semi-Supervised Learning for Deep Causal Generative Models | 深度因果生成模型的半监督学习 | Yasin Ibrahim, Hermione Warr, Konstantinos Kamnitsas | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions | HandBooster:通过手-物体交互的条件合成和采样来促进 3D 手网格重建 | Hao Xu, Haipeng Li, Yinqiao Wang, Shuaicheng Liu, Chi-Wing Fu | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Artifact Reduction in 3D and 4D Cone-beam Computed Tomography Images with Deep Learning -- A Review | 利用深度学习减少 3D 和 4D 锥束计算机断层扫描图像中的伪影 - 综述 | Mohammadreza Amirian, Daniel Barco, Ivo Herzig, Frank-Peter Schilling | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection | CosalPure:从组图像中学习概念以实现稳健的协同显着性检测 | Jiayi Zhu, Qing Guo, Felix Juefei-Xu, Yihao Huang, Yang Liu, Geguang Pu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | CT-3DFlow : Leveraging 3D Normalizing Flows for Unsupervised Detection of Pathological Pulmonary CT scans | CT-3DFlow:利用 3D 标准化流程进行病理性肺部 CT 扫描的无监督检测 | Aissam Djahnine, Alexandre Popoff, Emilien Jupin-Delevaux, Vincent Cottin, Olivier Nempont, Loic Boussel | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis | DiffusionFace:面向基于扩散的人脸伪造分析的综合数据集 | Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | DiffStyler: Diffusion-based Localized Image Style Transfer | DiffStyler:基于扩散的本地化图像风格迁移 | Shaoxu Li | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model | SingularTrajectory:使用扩散模型的通用轨迹预测器 | Inhwan Bae, Young-Jae Park, Hae-Gon Jeon | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models | U-Sketch:草图到图像扩散模型的有效方法 | Ilias Mitsouras, Eleftherios Tsonis, Paraskevi Tzouveli, Athanasios Voulodimos | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | ECNet: Effective Controllable Text-to-Image Diffusion Models | ECNet:有效的可控文本到图像扩散模型 | Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, Hongsheng Li | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks | 使用改进的深度卷积生成对抗网络进行抽象艺术中的颜色和画笔描边图案识别 | Srinitish Srinivasan, Varenya Pathak | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Generative Multi-modal Models are Good Class-Incremental Learners | 生成式多模态模型是良好的课堂增量学习器 | Xusheng Cao, Haori Lu, Linlan Huang, Xialei Liu, Ming-Ming Cheng | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Ship in Sight: Diffusion Models for Ship-Image Super Resolution | 船舶视线:船舶图像超分辨率的扩散模型 | Luigi Sigillo, Riccardo Fosco Gramaccioni, Alessandro Nicolosi, Danilo Comminiello | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | DODA: Diffusion for Object-detection Domain Adaptation in Agriculture | DODA:农业中目标检测领域适应的扩散 | Shuai Xiang, Pieter M. Blok, James Burridge, Haozhou Wang, Wei Guo | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment | DVLO:具有局部到全局特征融合和双向结构对准的深度视觉激光雷达里程计 | Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding | 通过分层解码释放 SAM 在医学适应方面的潜力 | Zhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach | 通过模型遗忘方法提高生成类增量学习性能 | Taro Togo, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation | NeuSDFusion:用于 3D 形状补全、重建和生成的空间感知生成模型 | Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, et.al. | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes | TAFormer:用于空中场景中视频和运动联合预测的统一目标感知变压器 | Liangyu Xu, Wanxuan Lu, Hongfeng Yu, Yongqiang Mao, Hanbo Bi, Chenglong Liu, Xian Sun, Kun Fu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation | NeuroPictor:通过多个体预训练和多级调制改进 fMRI 到图像重建 | Jingyang Huo, Yikai Wang, Xuelin Qian, Yun Wang, Chong Li, Jianfeng Feng, Yanwei Fu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Generative Medical Segmentation | 生成医疗分割 | Jiayu Huo, Xi Ouyang, Sébastien Ourselin, Rachel Sparks | arxiv.org/pdf/2403.18… | link |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Mini-Gemini:挖掘多模态视觉语言模型的潜力 | Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding | 通过指令对比解码减轻大视觉语言模型中的幻觉 | Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Bringing Textual Prompt to AI-Generated Image Quality Assessment | 将文本提示引入人工智能生成的图像质量评估 | Bowen Qu, Haohui Li, Wei Gao | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction | 语言能打败数值回归吗?基于语言的多模态轨迹预测 | Inhwan Bae, Junoh Lee, Hae-Gon Jeon | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective | 量化和减轻多模态大语言模型中的单模态偏差:因果视角 | Meiqi Chen, Yixin Cao, Yan Zhang, Chaochao Lu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images | H2ASeg:用于 PET/CT 图像中肿瘤分割的分层自适应交互和加权网络 | Jinpeng Lu, Jingyun Chen, Linghan Cai, Songhan Jiang, Yongbing Zhang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models | 超越嵌入:多模态模型中视觉表的前景 | Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu, Liwei Wang | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition | 用于手势识别的自适应多模态融合的进化网络架构搜索框架 | Yizhang Xia, Shihao Song, Zhanglu Hou, Junwen Xu, Juan Zou, Yuan Liu, Shengxiang Yang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking | 中间融合和多阶段、多形式提示,实现强大的 RGB-T 跟踪 | Qiming Wang, Yongqiang Bai, Hongxing Song | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Online Embedding Multi-Scale CLIP Features into 3D Maps | 将多尺度 CLIP 特征在线嵌入到 3D 地图中 | Shun Taguchi, Hideki Deguchi | arxiv.org/pdf/2403.18… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction | Gamba:将高斯泼溅法与 Mamba 结合起来进行单视图 3D 重建 | Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | SAT-NGP : Unleashing Neural Graphics Primitives for Fast Relightable Transient-Free 3D reconstruction from Satellite Imagery | SAT-NGP:释放神经图形基元,从卫星图像进行快速可重新照明的无瞬态 3D 重建 | Camille Billouard, Dawa Derksen, Emmanuelle Sarrazin, Bruno Vallet | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Modeling uncertainty for Gaussian Splatting | 高斯泼溅的不确定性建模 | Luca Savant, Diego Valsesia, Enrico Magli | arxiv.org/pdf/2403.18… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface | SplatFace:利用可优化表面的高斯 Splat 面重建 | Jiahao Luo, Jing Liu, James Davis | arxiv.org/pdf/2403.18… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Dense Vision Transformer Compression with Few Samples | 少量样本的密集视觉变压器压缩 | Hanxiao Zhang, Yifan Zhou, Guo-Hua Wang, Jianxin Wu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation | I2CKD:用于语义分割的类内和类间知识蒸馏 | Ayoub Karine, Thibault Napoléon, Maher Jridi | arxiv.org/pdf/2403.18… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Benchmarking Object Detectors with COCO: A New Path Forward | 使用 COCO 对目标检测器进行基准测试:一条新的前进道路 | Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition | ModaLink:统一模式以实现高效的图像到点云位置识别 | Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray | 基于图像的深度学习胸部 X 射线检测亚临床动脉粥样硬化 | Guglielmo Gallone, Francesco Iodice, Alberto Presta, Davide Tore, Ovidio de Filippo, Michele Visciano, Carlo Alberto Barbano, Alessandro Serafini, Paola Gorrini, Alessandro Bruno, et.al. | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | A vascular synthetic model for improved aneurysm segmentation and detection via Deep Neural Networks | 通过深度神经网络改进动脉瘤分割和检测的血管合成模型 | Rafic Nader, Florent Autrusseau, Vincent L'Allinec, Romain Bourcier | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Annolid: Annotate, Segment, and Track Anything You Need | Annolid:注释、分段和跟踪您需要的任何内容 | Chen Yang, Thomas A. Cleland | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected Datasets | 解决多个传感器中的数据注释挑战:斯堪尼亚收集数据集的解决方案 | Ajinkya Khoche, Aron Asefaw, Alejandro Gonzalez, Bogdan Timus, Sina Sharif Mansouri, Patric Jensfelt | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Transformers-based architectures for stroke segmentation: A review | 基于 Transformers 的笔画分割架构:综述 | Yalda Zafari-Ghadim, Essam A. Rashed, Mohamed Mabrok | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding | 同质分词器很重要:用于遥感图像理解的同质视觉分词器 | Run Shao, Zhaoyang Zhang, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision | 统一输入对计算机视觉中激活稀疏性和能量延迟攻击的影响 | Andreas Müller, Erwin Quiring | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes | 杂乱场景中高效的热图引导 6 自由度抓取检测 | Siang Chen, Wei Tang, Pengwei Xie, Wenming Yang, Guijin Wang | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Direct mineral content prediction from drill core images via transfer learning | 通过迁移学习从钻芯图像直接预测矿物含量 | Romana Boiger, Sergey V. Churakov, Ignacio Ballester Llagaria, Georg Kosakowski, Raphael Wüst, Nikolaos I. Prasianakis | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds | 密度引导转换器促进 3D 点云的合成到真实无监督域自适应分割 | Zhimin Yuan, Wankang Zeng, Yanfei Su, Weiquan Liu, Ming Cheng, Yulan Guo, Cheng Wang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Deep Learning Segmentation and Classification of Red Blood Cells Using a Large Multi-Scanner Dataset | 使用大型多扫描仪数据集对红细胞进行深度学习分割和分类 | Mohamed Elmanna, Ahmed Elsafty, Yomna Ahmed, Muhammad Rushdi, Ahmed Morsy | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification | 通道集成方法:无偏且低方差的伪标签对于半监督分类至关重要 | Jiaqi Wu, Junbiao Pang, Baochang Zhang, Qingming Huang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection | BAM:用于对象检测中实时 OoD 检测的框抽象监视器 | Changshun Wu, Weicheng He, Chih-Hong Cheng, Xiaowei Huang, Saddek Bensalem | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | ViTAR: Vision Transformer with Any Resolution | ViTAR:任何分辨率的视觉转换器 | Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Generating Diverse Agricultural Data for Vision-Based Farming Applications | 为基于视觉的农业应用生成多样化的农业数据 | Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hädrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Sören Pirk, Chia-Chun Fu, Wojciech Pałubicki | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | A Quantum Fuzzy-based Approach for Real-Time Detection of Solar Coronal Holes | 基于量子模糊的太阳冕洞实时检测方法 | Sanmoy Bandyopadhyay, Suman Kundu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Learning Inclusion Matching for Animation Paint Bucket Colorization | 学习动画油漆桶着色的包含匹配 | Yuekun Dai, Shangchen Zhou, Qinyue Li, Chongyi Li, Chen Change Loy | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Tracking-Assisted Object Detection with Event Cameras | 使用事件摄像机进行跟踪辅助物体检测 | Ting-Kang Yen, Igor Morawski, Shusil Dangi, Kai He, Chung-Yi Lin, Jia-Fong Yeh, Hung-Ting Su, Winston Hsu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | PIPNet3D: Interpretable Detection of Alzheimer in MRI Scans | PIPNet3D:MRI 扫描中阿尔茨海默病的可解释检测 | Lisa Anita De Santi, Jörg Schlötterer, Michael Scheschenja, Joel Wessendorf, Meike Nauta, Vincenzo Positano, Christin Seifert | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Uncertainty-Aware SAR ATR: Defending Against Adversarial Attacks via Bayesian Neural Networks | 不确定性感知 SAR ATR:通过贝叶斯神经网络防御对抗性攻击 | Tian Ye, Rajgopal Kannan, Viktor Prasanna, Carl Busart | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives | 用于优化不可分解目标的选择性混合微调 | Shrinivas Ramasubramanian, Harsh Rangwani, Sho Takemori, Kunal Samanta, Yuhei Umeda, Venkatesh Babu Radhakrishnan | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Multi-scale Unified Network for Image Classification | 用于图像分类的多尺度统一网络 | Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | SGDM: Static-Guided Dynamic Module Make Stronger Visual Models | SGDM:静态引导动态模块打造更强大的视觉模型 | Wenjie Xing, Zhenchao Cui, Jing Qi | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data | 用于根据超声数据检测前列腺癌的图像转换器的基准测试 | Mohamed Harmanani, Paul F. R. Wilson, Fahimeh Fooladgar, Amoon Jamzad, Mahdi Gilany, Minh Nguyen Nhat To, Brian Wodlinger, Purang Abolmaesumi, Parvin Mousavi | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification | 傅里叶或小波基作为 spikformer 中对应的自注意力,以实现高效的视觉分类 | Qingyu Wang, Duzhen Zhang, Tilelin Zhang, Bo Xu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Road Obstacle Detection based on Unknown Objectness Scores | 基于未知物体分数的道路障碍物检测 | Chihiro Noguchi, Toshiaki Ohgushi, Masao Yamanaka | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Few-shot Online Anomaly Detection and Segmentation | 少样本在线异常检测和分割 | Shenxing Wei, Xing Wei, Zhiheng Ma, Songlin Dong, Shaochen Zhang, Yihong Gong | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Looking Beyond What You See: An Empirical Analysis on Subgroup Intersectional Fairness for Multi-label Chest X-ray Classification Using Social Determinants of Racial Health Inequities | 超越您所看到的:利用种族健康不平等的社会决定因素对多标签胸部 X 射线分类的子组交叉公平性进行实证分析 | Dana Moukheiber, Saurabh Mahindre, Lama Moukheiber, Mira Moukheiber, Mingchen Gao | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Multi-Layer Dense Attention Decoder for Polyp Segmentation | 用于息肉分割的多层密集注意力解码器 | Krushi Patel, Fengjun Li, Guanghui Wang | arxiv.org/pdf/2403.18… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Towards Image Ambient Lighting Normalization | 迈向图像环境照明标准化 | Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Rakesh Ranjan, Radu Timofte | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Xiaotong Guo, Huijie Zhao, Shuwei Shao, Xudong Li, Baochang Zhang | arxiv.org/pdf/2403.18… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM | 图像网格值得制作视频:使用 VLM 进行零样本视频问答 | Wonkyun Kim, Changin Choi, Wonseok Lee, Wonjong Rhee | arxiv.org/pdf/2403.18… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | InstructBrush: Learning Attention-based Instruction Optimization for Image Editing | InstructBrush:学习基于注意力的图像编辑指令优化 | Ruoyu Zhao, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Wei Wu, Pengcheng Xu, Mingrui Zhu, Nannan Wang, Xinbo Gao | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Attention Calibration for Disentangled Text-to-Image Personalization | 用于解开文本到图像个性化的注意力校准 | Yanbing Zhang, Mengping Yang, Qin Zhou, Zhe Wang | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint | 具有空间频率感知和现实亮度约束的半监督夜间除雾基线 | Xiaofeng Cong, Jie Gui, Jing Zhang, Junming Hou, Hao Shen | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | HEMIT: H&E to Multiplex-immunohistochemistry Image Translation with Dual-Branch Pix2pix Generator | HEMIT:使用双分支 Pix2pix 生成器将 H&E 转换为多重免疫组织化学图像 | Chang Bian, Beth Philips, Tim Cootes, Martin Fergie | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation | 在 ViT 上学习 CNN:用于域适应的显式特定类边界的混合模型 | Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen, Hae-Gon Jeon, Tae Jong Choi | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Efficient Test-Time Adaptation of Vision-Language Models | 视觉语言模型的高效测试时适应 | Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting | 不要看黑暗:多元图像修复的潜在代码 | Haiwei Chen, Yajie Zhao | arxiv.org/pdf/2403.18… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark | 真实声场:视听室声学数据集和基准 | Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering | MetaCap:来自多视图图像的元学习先验,用于稀疏视图人类表现捕获和渲染 | Guoxing Sun, Rishabh Dabral, Pascal Fua, Christian Theobalt, Marc Habermann | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Garment3DGen: 3D Garment Stylization and Texture Generation | Garment3DGen:3D 服装风格化和纹理生成 | Nikolaos Sarafianos, Tuur Stuyck, Xiaoyu Xiang, Yilei Li, Jovan Popovic, Rakesh Ranjan | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment | Duolando:带有非策略强化学习的舞蹈伴奏跟随者 GPT | Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | ParCo: Part-Coordinating Text-to-Motion Synthesis | ParCo:部分协调文本到动作合成 | Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji | arxiv.org/pdf/2403.18… | link |
| 2024-03-27 | Backpropagation-free Network for 3D Test-time Adaptation | 用于 3D 测试时间适应的无反向传播网络 | Yanshuo Wang, Ali Cheraghian, Zeeshan Hayder, Jie Hong, Sameera Ramasinghe, Shafin Rahman, David Ahmedt-Aristizabal, Xuesong Li, Lars Petersson, Mehrtash Harandi | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | MonoHair: High-Fidelity Hair Modeling from a Monocular Video | MonoHair:通过单目视频进行高保真头发建模 | Keyu Wu, Lingchen Yang, Zhiyi Kuang, Yao Feng, Xutao Han, Yuefan Shen, Hongbo Fu, Kun Zhou, Youyi Zheng | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | AIR-HLoc: Adaptive Image Retrieval for Efficient Visual Localisation | AIR-HLoc:用于高效视觉定位的自适应图像检索 | Changkun Liu, Huajian Huang, Zhengyang Ma, Tristan Braud | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Toward Interactive Regional Understanding in Vision-Large Language Models | 迈向大视觉语言模型中的交互式区域理解 | Jungbeom Lee, Sanghyuk Chun, Sangdoo Yun | arxiv.org/pdf/2403.18… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning | OrCo:通过少样本类增量学习的正交性和对比实现更好的泛化 | Noor Ahmed, Anna Kukleva, Bernt Schiele | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Towards Non-Exemplar Semi-Supervised Class-Incremental Learning | 迈向非范例半监督课堂增量学习 | Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Image Deraining via Self-supervised Reinforcement Learning | 通过自监督强化学习进行图像去雨 | He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, Ping-Chun Hsieh, Chung-Chi Tsai | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning | 分支调优:平衡持续自我监督学习的稳定性和可塑性 | Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu | arxiv.org/pdf/2403.18… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-27 | Enhancing Manufacturing Quality Prediction Models through the Integration of Explainability Methods | 通过集成可解释性方法增强制造质量预测模型 | Dennis Gross, Helge Spieker, Arnaud Gotlieb, Ricardo Knoblauch | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Deep Learning for Robust and Explainable Models in Computer Vision | 计算机视觉中稳健且可解释模型的深度学习 | Mohammadreza Amirian | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing | FlexEdit:灵活可控的基于扩散的以对象为中心的图像编辑 | Trong-Tung Nguyen, Duc-Anh Nguyen, Anh Tran, Cuong Pham | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos | RAP:用于教学视频中自适应程序规划的检索增强规划器 | Ali Zare, Yulei Niu, Hammad Ayyubi, Shih-fu Chang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Users prefer Jpegli over same-sized libjpeg-turbo or MozJPEG | 与相同大小的 libjpeg-turbo 或 MozJPEG 相比,用户更喜欢 Jpegli | Martin Bruse, Luca Versari, Zoltan Szabadka, Jyrki Alakuijala | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Language Plays a Pivotal Role in the Object-Attribute Compositional Generalization of CLIP | 语言在 CLIP 的对象属性组合概括中发挥着关键作用 | Reza Abbasi, Mohammad Samiei, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | VersaT2I: Improving Text-to-Image Models with Versatile Reward | VersaT2I:通过多功能奖励改进文本到图像模型 | Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Scaling Vision-and-Language Navigation With Offline RL | 通过离线强化学习扩展视觉和语言导航 | Valay Bundele, Mahesh Bhupati, Biplab Banerjee, Aditya Grover | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion | FTBC:用于优化 ANN-SNN 转换的前向时间偏差校正 | Xiaofeng Wu, Velibor Bojkovic, Bin Gu, Kun Suo, Kai Zou | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and Comparisons | 在高性能计算机设备上实施主成分分析以实现高光谱降维:结果和比较 | E. Martel, R. Lazcano, J. Lopez, D. Madroñal, R. Salvador, S. Lopez, E. Juarez, R. Guerra, C. Sanz, R. Sarmiento | arxiv.org/pdf/2403.18… | null |
| 2024-03-27 | LayoutFlow: Flow Matching for Layout Generation | LayoutFlow:布局生成的流匹配 | Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama | arxiv.org/pdf/2403.18… | null |