[UPDATED!] 2024-03-04 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Gradient Correlation Subspace Learning against Catastrophic Forgetting | 针对灾难性遗忘的梯度相关子空间学习 | Tammuz Dubnov, Vishal Thengane | arxiv.org/pdf/2403.02… | null |
| 2024-03-05 | UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control | UniCtrl:通过免训练统一注意力控制提高文本到视频扩散模型的时空一致性 | Xuweiyi Chen, Tian Xia, Sihan Xu | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors | 3DTopia:具有混合扩散先验的大型文本到 3D 生成模型 | Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | DragTex: Generative Point-Based Texture Editing on 3D Mesh | DragTex:3D 网格上基于点的生成纹理编辑 | Yudi Zhang, Qi Xu, Lei Zhang | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Domain adaptation, Explainability & Fairness in AI for Medical Image Analysis: Diagnosis of COVID-19 based on 3-D Chest CT-scans | 医学图像分析 AI 的领域适应、可解释性和公平性:基于 3D 胸部 CT 扫描的 COVID-19 诊断 | Dimitrios Kollias, Anastasios Arsenos, Stefanos Kollias | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Point2Building: Reconstructing Buildings from Airborne LiDAR Point Clouds | Point2Building:利用机载 LiDAR 点云重建建筑物 | Yujia Liu, Anton Obukhov, Jan Dirk Wegner, Konrad Schindler | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models | ResAdapter:扩散模型的域一致分辨率适配器 | Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey | 基于伪标签的半监督语义分割:一项调查 | Lingyan Ran, Yali Li, Guoqiang Liang, Yanning Zhang | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio | FaceChain-ImagineID:从解开的音频中自由制作高保真多样化的说话面孔 | Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, et.al. | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models | ViewDiff:使用文本到图像模型生成 3D 一致图像 | Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on | OOTDiffusion:基于舾装融合的潜在扩散,用于可控虚拟试穿 | Yuhao Xu, Tao Gu, Weifeng Chen, Chengcai Chen | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | AFBT GAN: enhanced explainability and diagnostic performance for cognitive decline by counterfactual generative adversarial network | AFBT GAN:通过反事实生成对抗网络增强认知能力下降的可解释性和诊断性能 | Xiongri Shen, Zhenxi Song, Zhiguo Zhang | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances | HanDiffuser:具有逼真手部外观的文本到图像生成 | Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Improving Adversarial Energy-Based Model via Diffusion Process | 通过扩散过程改进基于能量的对抗模型 | Cong Geng, Tian Han, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Søren Hauberg, Bo Li | arxiv.org/pdf/2403.01… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Density-based Isometric Mapping | 基于密度的等轴测图 | Bardia Yousefi, Mélina Khansari, Ryan Trask, Patrick Tallon, Carina Carino, Arman Afrasiyabi, Vikas Kundra, Lan Ma, Lei Ren, Keyvan Farahani, et.al. | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Differentially Private Representation Learning via Image Captioning | 通过图像字幕进行差分私人表示学习 | Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review | 用于医疗报告生成和视觉问答的视觉语言模型:综述 | Iryna Hartsock, Ghulam Rasool | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks | COMMIT:证明多传感器融合系统针对语义攻击的鲁棒性 | Zijian Huang, Wenda Chu, Linyi Li, Chejian Xu, Bo Li | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation | 超越专业化:评估 MLLM 在年龄和性别估计方面的能力 | Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | A New Perspective on Smiling and Laughter Detection: Intensity Levels Matter | 微笑和笑声检测的新视角:强度水平很重要 | Hugo Bohy, Kevin El Haddad, Thierry Dutoit | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations | 多模式社交互动建模:具有密集对齐表示的新挑战和基线 | Sangmin Lee, Bolin Lai, Fiona Ryan, Bikram Boote, James M. Rehg | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Modality-Aware and Shift Mixer for Multi-modal Brain Tumor Segmentation | 用于多模态脑肿瘤分割的模态感知和移位混合器 | Zhongzhen Huang, Linda Wei, Shaoting Zhang, Xiaofan Zhang | arxiv.org/pdf/2403.02… | null |
| 2024-03-05 | TNF: Tri-branch Neural Fusion for Multimodal Medical Data Classification | TNF:用于多模式医疗数据分类的三分支神经融合 | Tong Zheng, Shusaku Sone, Yoshitaka Ushiku, Yuki Oba, Jiaxin Ma | arxiv.org/pdf/2403.01… | null |
| 2024-03-05 | NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models | NPHardEval4V:多模态大语言模型的动态推理基准 | Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu Jin, Lingyao Li, Haoyang Ling, Jinkui Chi, Jindong Wang, Xin Ma, et.al. | arxiv.org/pdf/2403.01… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | DaReNeRF: Direction-aware Representation for Dynamic Scenes | DaReNeRF:动态场景的方向感知表示 | Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Depth-Guided Robust and Fast Point Cloud Fusion NeRF for Sparse Input Views | 用于稀疏输入视图的深度引导鲁棒快速点云融合 NeRF | Shuai Guo, Qiuwen Wang, Yijie Gao, Rong Xie, Li Song | arxiv.org/pdf/2403.02… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Encodings for Prediction-based Neural Architecture Search | 基于预测的神经架构搜索的编码 | Yash Akhauri, Mohamed S. Abdelfattah | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | On Latency Predictors for Neural Architecture Search | 神经架构搜索的延迟预测器 | Yash Akhauri, Mohamed S. Abdelfattah | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | UB-FineNet: Urban Building Fine-grained Classification Network for Open-access Satellite Images | UB-FineNet:开放获取卫星图像的城市建筑细粒度分类网络 | Zhiyi He, Wei Yao, Jie Shao, Puzuo Wang | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | CSE: Surface Anomaly Detection with Contrastively Selected Embedding | CSE:通过对比选择嵌入进行表面异常检测 | Simon Thomine, Hichem Snoussi | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | NASH: Neural Architecture Search for Hardware-Optimized Machine Learning Models | NASH:硬件优化机器学习模型的神经架构搜索 | Mengfei Ji, Zaid Al-Ars | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | Neural Network Assisted Lifting Steps For Improved Fully Scalable Lossy Image Compression in JPEG 2000 | 用于改进 JPEG 2000 中完全可扩展有损图像压缩的神经网络辅助提升步骤 | Xinyue Li, Aous Naman, David Taubman | arxiv.org/pdf/2403.01… | link |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Coronary artery segmentation in non-contrast calcium scoring CT images using deep learning | 使用深度学习进行非造影钙评分 CT 图像中的冠状动脉分割 | Mariusz Bujny, Katarzyna Jesionek, Jakub Nalepa, Karol Miszalski-Jamka, Katarzyna Widawka-Żak, Sabina Wolny, Marcin Kostur | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | When do Convolutional Neural Networks Stop Learning? | 卷积神经网络什么时候停止学习? | Sahan Ahmad, Gabriel Trahan, Aminul Islam | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | Anatomically Constrained Tractography of the Fetal Brain | 胎儿大脑的解剖学约束纤维束成像 | Camilo Calixto, Camilo Jaimes, Matheus D. Soldatelli, Simon K. Warfield, Ali Gholipour, Davood Karimi | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function | NiNformer:具有令牌混合生成门控功能的网络变压器中的网络 | Abdullah Nazhat Abdullah, Tarkan Aydin | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | Brand Visibility in Packaging: A Deep Learning Approach for Logo Detection, Saliency-Map Prediction, and Logo Placement Analysis | 包装中的品牌可见度:用于徽标检测、显着图预测和徽标放置分析的深度学习方法 | Alireza Hosseini, Kiana Hooshanfar, Pouria Omrani, Reza Toosi, Ramin Toosi, Zahra Ebrahimian, Mohammad Ali Akhaee | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | RegionGPT: Towards Region Understanding Vision Language Model | RegionGPT:迈向区域理解视觉语言模型 | Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | 对比区域指导:无需训练即可改善视觉语言模型的基础 | David Wan, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Bayesian Uncertainty Estimation by Hamiltonian Monte Carlo: Applications to Cardiac MRI Segmentation | 哈密顿蒙特卡罗贝叶斯不确定性估计:在心脏 MRI 分割中的应用 | Yidong Zhao, Joao Tourais, Iain Pierce, Christian Nitsche, Thomas A. Treibel, Sebastian Weingärtner, Artur M. Schweidtmann, Qian Tao | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures | Vision-RWKV:使用类似 RWKV 的架构实现高效且可扩展的视觉感知 | Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | Harnessing Intra-group Variations Via a Population-Level Context for Pathology Detection | 通过群体水平背景利用组内变异进行病理检测 | P. Bilha Githinji, Xi Yuan, Zhenglin Chen, Ijaz Gul, Dingqi Shang, Wen Liang, Jianming Deng, Dan Zeng, Dongmei yu, Chenggang Yan, et.al. | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | REAL-Colon: A dataset for developing real-world AI applications in colonoscopy | REAL-Colon:用于开发结肠镜检查中真实人工智能应用的数据集 | Carlo Biffi, Giulio Antonelli, Sebastian Bernhofer, Cesare Hassan, Daizen Hirata, Mineo Iwatate, Andreas Maieron, Pietro Salvagnini, Andrea Cherubini | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection | MiM-ISTD:用于高效红外小目标检测的 Mamba-in-Mamba | Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Jieping Ye, Nenghai Yu | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Self-Supervised Facial Representation Learning with Facial Region Awareness | 具有面部区域意识的自监督面部表征学习 | Zheng Gao, Ioannis Patras | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | LOCR: Location-Guided Transformer for Optical Character Recognition | LOCR:用于光学字符识别的位置引导变压器 | Yu Sun, Dongzhan Zhou, Chen Lin, Conghui He, Wanli Ouyang, Han-Sen Zhong | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT | VTG-GPT:使用 GPT 的免调整零镜头视频临时接地 | Yifang Xu, Yunzhuo Sun, Zien Xie, Benxiang Zhai, Sidan Du | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction | DiffMOT:具有非线性预测的基于扩散的实时多目标跟踪器 | Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | HyperPredict: Estimating Hyperparameter Effects for Instance-Specific Regularization in Deformable Image Registration | HyperPredict:估计可变形图像配准中实例特定正则化的超参数效应 | Aisha L. Shuaibu, Ivor J. A. Simpson | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | A Generative Approach for Wikipedia-Scale Visual Entity Recognition | 维基百科规模视觉实体识别的生成方法 | Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving | 适用于自动驾驶的可扩展视觉 3D 物体检测和单目深度估计 | Yuxuan Liu | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | Leveraging Anchor-based LiDAR 3D Object Detection via Point Assisted Sample Selection | 通过点辅助样本选择利用基于锚点的 LiDAR 3D 物体检测 | Shitao Chen, Haolin Zhang, Nanning Zheng | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection | 用于视频伪装物体检测的显式运动处理和交互式提示 | Xin Zhang, Tao Xiao, Gepeng Ji, Xuan Wu, Keren Fu, Qijun Zhao | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning | 通过距离感知对比学习增强信息最大化,实现无源跨域少样本学习 | Huali Xu, Li Liu, Shuaifeng Zhi, Shaojing Fu, Zhuo Su, Ming-Ming Cheng, Yongxiang Liu | arxiv.org/pdf/2403.01… | link |
| 2024-03-05 | Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification | 弥合增强差距的傅里叶基函数:重新思考图像分类中的频率增强 | Puru Vaish, Shunxin Wang, Nicola Strisciuglio | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | xT: Nested Tokenization for Larger Context in Large Images | xT:大图像中更大上下文的嵌套标记化 | Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Map-aided annotation for pole base detection | 用于杆基检测的地图辅助注释 | Benjamin Missaoui, Maxime Noizet, Philippe Xu | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | FreeA: Human-object Interaction Detection using Free Annotation Labels | FreeA:使用免费注释标签进行人机交互检测 | Yuxiao Wang, Zhenao Wei, Xinyu Jiang, Yu Lei, Weiying Xue, Jinxiu Liu, Qi Liu | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation | AllSpark:从 Transformer 中未标记的特征中重生,用于半监督语义分割 | Haonan Wang, Qixiang Zhang, Yi Li, Xiaomeng Li | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | 高效手部网格重建的简单基线 | Zhishan Zhou, Shihao. zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | PointCore: Efficient Unsupervised Point Cloud Anomaly Detector Using Local-Global Features | PointCore:使用局部-全局特征的高效无监督点云异常检测器 | Baozhu Zhao, Qiwei Xiong, Xiaohan Zhang, Jingfeng Guo, Qi Liu, Xiaofen Xing, Xiangmin Xu | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses | RankED:使用基于排名的损失解决边缘检测中的不平衡和不确定性 | Bedrettin Cetinkaya, Sinan Kalkan, Emre Akbas | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection | 揭露欺骗:发现更多用于 Deepfake 检测的伪造线索 | Zhongjie Ba, Qingyu Liu, Zhenguang Liu, Shuang Wu, Feng Lin, Li Lu, Kui Ren | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning | 集成高效的最优传输和功能图以实现无监督形状对应学习 | Tung Le, Khai Nguyen, Shanlin Sun, Nhat Ho, Xiaohui Xie | arxiv.org/pdf/2403.01… | null |
| 2024-03-05 | Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition | 手写数学表达式识别的注意力引导机制 | Yutian Liu, Wenjun Ke, Jianguo Wei | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Training-Free Pretrained Model Merging | 免训练预训练模型合并 | Zhengqi Xu, Ke Yuan, Huiqiong Wang, Yong Wang, Mingli Song, Jie Song | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | Lightweight Object Detection: A Study Based on YOLOv7 Integrated with ShuffleNetv2 and Vision Transformer | 轻量级目标检测:基于YOLOv7结合ShuffleNetv2和Vision Transformer的研究 | Wenkai Gong | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant Features | RISeg:通过身体框架不变特征进行机器人交互式对象分割 | Howard H. Qian, Yangxiao Lu, Kejia Ren, Gaotian Wang, Ninad Khargonkar, Yu Xiang, Kaiyu Hang | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | MCA: Moment Channel Attention Networks | MCA:时刻通道注意力网络 | Yangbo Jiang, Zhiwei Jiang, Le Han, Zenan Huang, Nenggan Zheng | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | PI-AstroDeconv: A Physics-Informed Unsupervised Learning Method for Astronomical Image Deconvolution | PI-AstroDeconv:一种基于物理的天文图像反卷积无监督学习方法 | Shulei Ni, Yisheng Qiu, Yunchun Chen, Zihao Song, Hao Chen, Xuejian Jiang, Huaxi Chen | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection | 用于视觉语言目标检测的零样本可推广增量学习 | Jieren Deng, Haojian Zhang, Kun Ding, Jianhua Hu, Xingxuan Zhang, Yunkuan Wang | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | PillarGen: Enhancing Radar Point Cloud Density and Quality via Pillar-based Point Generation Network | PillarGen:通过基于 Pillar 的点生成网络增强雷达点云密度和质量 | Jisong Kim, Geonho Bang, Kwangjin Choi, Minjae Seong, Jaechang Yoo, Eunjong Pyo, Jun Won Choi | arxiv.org/pdf/2403.01… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Iterative Occlusion-Aware Light Field Depth Estimation using 4D Geometrical Cues | 使用 4D 几何线索进行迭代遮挡感知光场深度估计 | Rui Lourenço, Lucas Thomaz, Eduardo A. B. Silva, Sergio M. M. Faria | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy | DD-VNB:基于深度的实时视觉导航支气管镜双环框架 | Qingyao Tian, Huai Liao, Xinyan Huang, Jian Chen, Zihui Zhang, Bingyu Yang, Sebastien Ourselin, Hongbin Liu | arxiv.org/pdf/2403.01… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement | 用于低光视频增强的时空对齐 SUNet 模型 | Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, David Bull | arxiv.org/pdf/2403.02… | null |
| 2024-03-05 | Neural Redshift: Random Networks are not Random Functions | 神经红移:随机网络不是随机函数 | Damien Teney, Armand Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | TripoSR: Fast 3D Object Reconstruction from a Single Image | TripoSR:从单个图像快速重建 3D 对象 | Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, Yan-Pei Cao | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | Position Paper: Towards Implicit Prompt For Text-To-Image Models | 立场文件:走向文本到图像模型的隐式提示 | Yue Yang, Yuqi lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | DEMOS: Dynamic Environment Motion Synthesis in 3D Scenes via Local Spherical-BEV Perception | 演示:通过局部球形 BEV 感知在 3D 场景中进行动态环境运动合成 | Jingyu Gong, Min Wang, Wentao Liu, Chen Qian, Zhizhong Zhang, Yuan Xie, Lizhuang Ma | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | 3D Hand Reconstruction via Aggregating Intra and Inter Graphs Guided by Prior Knowledge for Hand-Object Interaction Scenario | 在手-物体交互场景的先验知识的指导下,通过聚合帧内图和帧间图来重建 3D 手部 | Feng Shuang, Wenbo He, Shaodong Li | arxiv.org/pdf/2403.01… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | A dataset of over one thousand computed tomography scans of battery cells | 超过一千次电池计算机断层扫描的数据集 | Amariah Condon, Bailey Buscarino, Eric Moch, William J. Sehnert, Owen Miles, Patrick K. Herring, Peter M. Attia | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Twisting Lids Off with Two Hands | 用两只手拧开盖子 | Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Physics-Informed Learning for Time-Resolved Angiographic Contrast Agent Concentration Reconstruction | 用于时间分辨血管造影造影剂浓度重建的物理知情学习 | Noah Maul, Annette Birkhold, Fabian Wagner, Mareike Thies, Maximilian Rohleder, Philipp Berg, Markus Kowarschik, Andreas Maier | arxiv.org/pdf/2403.01… | null |
| 2024-03-05 | Tree Counting by Bridging 3D Point Clouds with Imagery | 通过将 3D 点云与图像桥接来进行树木计数 | Lei Li, Tianfang Zhang, Zhongyu Jiang, Cheng-Yen Yang, Jenq-Neng Hwang, Stefan Oehmcke, Dimitri Pierre Johannes Gominski, Fabian Gieseke, Christian Igel | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | AiSDF: Structure-aware Neural Signed Distance Fields in Indoor Scenes | AiSDF:室内场景中的结构感知神经符号距离场 | Jaehoon Jang, Inha Lee, Minje Kim, Kyungdon Joo | arxiv.org/pdf/2403.01… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Superpixel Graph Contrastive Clustering with Semantic-Invariant Augmentations for Hyperspectral Images | 高光谱图像的超像素图对比聚类与语义不变增强 | Jianhan Qi, Yuheng Jia, Hui Liu, Junhui Hou | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Open-world Machine Learning: A Review and New Outlooks | 开放世界机器学习:回顾与新展望 | Fei Zhu, Shijie Ma, Zhen Cheng, Xu-Yao Zhang, Zhaoxiang Zhang, Cheng-Lin Liu | arxiv.org/pdf/2403.01… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-04 | Towards Calibrated Deep Clustering Network | 迈向校准深度聚类网络 | Yuheng Jia, Jianhong Cheng, Hui Liu, Junhui Hou | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging | 优化双曝光 HDR 成像中的光源估计 | Mahmoud Afifi, Zhenhua Hu, Liang Liang | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Non-autoregressive Sequence-to-Sequence Vision-Language Models | 非自回归序列到序列视觉语言模型 | Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Interpretable Models for Detecting and Monitoring Elevated Intracranial Pressure | 用于检测和监测颅内压升高的可解释模型 | Darryl Hannan, Steven C. Nesbit, Ximing Wen, Glen Smith, Qiao Zhang, Alberto Goffi, Vincent Chan, Michael J. Morris, John C. Hunninghake, Nicholas E. Villalobos, et.al. | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Perceptive self-supervised learning network for noisy image watermark removal | 用于去除噪声图像水印的感知自监督学习网络 | Chunwei Tian, Menghua Zheng, Bo Li, Yanning Zhang, Shichao Zhang, David Zhang | arxiv.org/pdf/2403.02… | null |
| 2024-03-04 | Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models | 使用地理空间基础模型的多光谱遥感图像检索 | Benedikt Blumenstiel, Viktoria Moor, Romeo Kienzler, Thomas Brunschwiler | arxiv.org/pdf/2403.02… | link |
| 2024-03-04 | TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions | TTA-Nav:视觉损坏下点目标导航的测试时自适应重建 | Maytus Piriyajitakonkij, Mingfei Sun, Mengmi Zhang, Wei Pan | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Advancing Gene Selection in Oncology: A Fusion of Deep Learning and Sparsity for Precision Gene Selection | 推进肿瘤学基因选择:深度学习和稀疏性的融合实现精准基因选择 | Akhila Krishna, Ravi Kant Gupta, Pranav Jeevan, Amit Sethi | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Revisiting Learning-based Video Motion Magnification for Real-time Processing | 重新审视基于学习的视频运动放大以进行实时处理 | Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis | PLACE:用于语义图像合成的自适应布局-语义融合 | Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo, Kwan-Yee K. Wong | arxiv.org/pdf/2403.01… | link |
| 2024-03-04 | One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models | 一个提示词足以提高预训练视觉语言模型的对抗鲁棒性 | Lin Li, Haoyan Guan, Jianing Qiu, Michael Spratling | arxiv.org/pdf/2403.01… | link |
| 2024-03-05 | AtomoVideo: High Fidelity Image-to-Video Generation | AtomoVideo:高保真图像到视频生成 | Litong Gong, Yiran Zhu, Weijie Li, Xiaoyang Kang, Biao Wang, Tiezheng Ge, Bo Zheng | arxiv.org/pdf/2403.01… | null |
| 2024-03-04 | Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction | 改善社交机器人的视觉感知,以实现受控和野外人机交互 | Wangjie Zhong, Leimin Tian, Duy Tho Le, Hamid Rezatofighi | arxiv.org/pdf/2403.01… | null |