[UPDATED!] 2024-03-01 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks | Diff-Plugin:重振基于扩散的低级任务的细节 | Yuhao Liu, Fang Liu, Zhanghan Ke, Nanxuan Zhao, Rynson W. H. Lau | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Graph Theory and GNNs to Unravel the Topographical Organization of Brain Lesions in Variants of Alzheimer's Disease Progression | 图论和 GNN 揭示阿尔茨海默病进展变体中脑损伤的拓扑结构 | Leopold Hebert-Stevens, Gabriel Jimenez, Benoit Delatour, Lev Stimmer, Daniel Racoceanu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Rethinking Few-shot 3D Point Cloud Semantic Segmentation | 重新思考少样本 3D 点云语义分割 | Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongie | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset | 通过自动导出的数据集改善文本到图像生成中的显式空间关系 | Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Keller | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Rethinking cluster-conditioned diffusion models | 重新思考集群条件扩散模型 | Nikolas Adaloglou, Tim Kaiser, Felix Michels, Markus Kollmann | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Deformable One-shot Face Stylization via DINO Semantic Guidance | 通过 DINO 语义指导进行可变形的一次性面部风格化 | Yang Zhou, Zichong Chen, Hui Huang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | An Ordinal Diffusion Model for Generating Medical Images with Different Severity Levels | 生成不同严重程度的医学图像的序数扩散模型 | Shumpei Takezaki, Seiichi Uchida | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | LoMOE: Localized Multi-Object Editing via Multi-Diffusion | LoMOE:通过多重扩散进行本地化多对象编辑 | Goirik Chakrabarty, Aditya Chandrasekar, Ramya Hebbalaguppe, Prathosh AP | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Abductive Ego-View Accident Video Understanding for Safe Driving Perception | 溯因式自我观看事故视频理解以实现安全驾驶感知 | Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation | HyperSDFusion:桥接语言和几何中的层次结构以增强 3D Text2Shape 生成 | Zhiying Leng, Tolga Birdal, Xiaohui Liang, Federico Tombari | arxiv.org/pdf/2403.00… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | Few-Shot Relation Extraction with Hybrid Visual Evidence | 具有混合视觉证据的少样本关系提取 | Jiaying Gong, Hoda Eldardiry | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding | HALC:通过自适应焦点对比度解码减少物体幻觉 | Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Exploring the dynamic interplay of cognitive load and emotional arousal by using multimodal measurements: Correlation of pupil diameter and emotional arousal in emotionally engaging tasks | 通过使用多模态测量探索认知负荷和情绪唤醒的动态相互作用:情绪参与任务中瞳孔直径和情绪唤醒的相关性 | C. Kosel, S. Michel, T. Seidel, M. Foerster | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes | MS-Net:多场景运动预测的多路径稀疏模型 | Xiaqiang Tang, Weigao Sun, Siyuan Hu, Yiyang Sun, Yafeng Guo | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models | 多模态 ArXiv:用于提高大型视觉语言模型科学理解的数据集 | Lei Li, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Multi-modal Attribute Prompting for Vision-Language Models | 视觉语言模型的多模态属性提示 | Xin Liu, Jiamin Wu, Tianzhu Zhang | arxiv.org/pdf/2403.00… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | DISORF: A Distributed Online NeRF Training and Rendering Framework for Mobile Robots | DISORF:用于移动机器人的分布式在线 NeRF 训练和渲染框架 | Chunlin Li, Ruofan Liang, Hanrui Fan, Zhengen Zhang, Sankeerth Durvasula, Nandita Vijaykumar | arxiv.org/pdf/2403.00… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | Learning Causal Features for Incremental Object Detection | 学习增量对象检测的因果特征 | Zhenwei He, Lei Zhang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Data-efficient Event Camera Pre-training via Disentangled Masked Modeling | 通过解缠屏蔽建模进行数据高效的事件相机预训练 | Zhenpeng Huang, Chao Li, Hao Chen, Yongjian Deng, Yifeng Geng, Limin Wang | arxiv.org/pdf/2403.00… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor | 相机和全局位姿传感器的联合时空校准 | Junlin Song, Antoine Richard, Miguel Olivares-Mendez | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Can Transformers Capture Spatial Relations between Objects? | 变形金刚可以捕捉物体之间的空间关系吗? | Chuan Wen, Dinesh Jayaraman, Yang Gao | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | COLON: The largest COlonoscopy LONg sequence public database | COLON:最大的结肠镜检查长序列公共数据库 | Lina Ruiz, Franklin Sierra-Jerez, Jair Ruiz, Fabio Martinez | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Region-Adaptive Transform with Segmentation Prior for Image Compression | 用于图像压缩的具有分割先验的区域自适应变换 | Yuxi Liu, Wenhan Yang, Huihui Bai, Yunchao Wei, Yao Zhao | arxiv.org/pdf/2403.00… | link |
| 2024-03-01 | IDTrust: Deep Identity Document Quality Detection with Bandpass Filtering | IDTrust:使用带通滤波进行深度身份文档质量检测 | Musab Al-Ghadi, Joris Voerman, Souhail Bakkali, Mickaël Coustaty, Nicolas Sidere, Xavier St-Georges | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Lincoln's Annotated Spatio-Temporal Strawberry Dataset (LAST-Straw) | 林肯带注释的时空草莓数据集 (LAST-Straw) | Katherine Margaret Frances James, Karoline Heiwolt, Daniel James Sargent, Grzegorz Cielniak | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | SURE: SUrvey REcipes for building reliable and robust deep networks | 当然:构建可靠且强大的深度网络的调查秘诀 | Yuting Li, Yingyi Chen, Xuanlong Yu, Dexiong Chen, Xi Shen | arxiv.org/pdf/2403.00… | link |
| 2024-03-01 | Large Language Models for Simultaneous Named Entity Extraction and Spelling Correction | 用于同时命名实体提取和拼写纠正的大型语言模型 | Edward Whittaker, Ikuo Kitagishi | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | GLFNET: Global-Local (frequency) Filter Networks for efficient medical image segmentation | GLFNET:用于高效医学图像分割的全局-局部(频率)滤波器网络 | Athanasios Tragakis, Qianying Liu, Chaitanya Kaul, Swalpa Kumar Roy, Hang Dai, Fani Deligianni, Roderick Murray-Smith, Daniele Faccio | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Invariant Test-Time Adaptation for Vision-Language Model Generalization | 视觉语言模型泛化的不变测试时间适应 | Huan Ma, Yan Zhu, Changqing Zhang, Peilin Zhao, Baoyuan Wu, Long-Kai Huang, Qinghua Hu, Bingzhe Wu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | DAMS-DETR: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | DAMS-DETR:具有竞争性查询选择和自适应特征融合的动态自适应多光谱检测变压器 | Guo Junjie, Gao Chenqiang, Liu Fangcen, Meng Deyu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Small, Versatile and Mighty: A Range-View Perception Framework | 小巧、多功能、强大:范围-视图感知框架 | Qiang Meng, Xiao Wang, JiaBao Wang, Liujiang Yan, Ke Wang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Embedded Multi-label Feature Selection via Orthogonal Regression | 通过正交回归进行嵌入式多标签特征选择 | Xueyuan Xu, Fulin Wei, Tianyuan Jia, Li Zhuo, Feiping Nie, Xia Wu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | ODM:一种用于场景文本检测和定位的文本-图像进一步对齐预训练方法 | Chen Duan, Pei Fu, Shan Guo, Qianyi Jiang, Xiaoming Wei | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation | CustomListener:文本引导的响应式交互,用于用户友好的聆听头生成 | Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval | 双姿势不变嵌入:用于识别和检索的学习类别和特定于对象的判别表示 | Rohan Sarkar, Avinash Kak | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Robust deep labeling of radiological emphysema subtypes using squeeze and excitation convolutional neural networks: The MESA Lung and SPIROMICS Studies | 使用挤压和激励卷积神经网络对放射性肺气肿亚型进行稳健深度标记:MESA Lung 和 SPIROMICS 研究 | Artur Wysoczanski, Nabil Ettehadi, Soroush Arabshahi, Yifei Sun, Karen Hinkley Stukovsky, Karol E. Watson, MeiLan K. Han, Erin D Michos, Alejandro P. Comellas, Eric A. Hoffman, et.al. | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Cloud-based Federated Learning Framework for MRI Segmentation | 基于云的 MRI 分割联合学习框架 | Rukesh Prajapati, Amr S. El-Wakeel | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach | 重新思考长尾识别中的分类器重新训练:一种简单的 Logits 重定向方法 | Han Lu, Siyu Sun, Yichen Xie, Liqing Zhang, Xiaokang Yang, Junchi Yan | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | YOLO-MED : Multi-Task Interaction Network for Biomedical Images | YOLO-MED:生物医学图像的多任务交互网络 | Suizhi Huang, Shalayiding Sirejiding, Yuxiang Lu, Yue Ding, Leheng Liu, Hui Zhou, Hongtao Lu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Trustworthy Self-Attention: Enabling the Network to Focus Only on the Most Relevant References | 值得信赖的自我关注:使网络仅关注最相关的参考文献 | Yu Jing, Tan Yujuan, Ren Ao, Liu Duo | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set Analysis | MaskLRF:通过局部参考系的屏蔽自动编码进行自监督预训练,用于旋转不变的 3D 点集分析 | Takahiko Furuya | arxiv.org/pdf/2403.00… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | Rethinking Inductive Biases for Surface Normal Estimation | 重新思考表面法线估计的归纳偏差 | Gwangbin Bae, Andrew J. Davison | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Rethinking The Uniformity Metric in Self-Supervised Learning | 重新思考自我监督学习中的均匀性指标 | Xianghong Fang, Jian Li, Qiang Sun, Benyou Wang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Improving Acne Image Grading with Label Distribution Smoothing | 通过标签分布平滑改进痤疮图像分级 | Kirill Prokhorov, Alexandr A. Kalinin | arxiv.org/pdf/2403.00… | null |
LLM
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | TempCompass: Do Video LLMs Really Understand Videos? | TempCompass:视频法学硕士真的了解视频吗? | Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou | arxiv.org/pdf/2403.00… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | Tri-Modal Motion Retrieval by Learning a Joint Embedding Space | 通过学习联合嵌入空间进行三模态运动检索 | Kangning Yin, Shihao Zou, Yuxuan Ge, Zheng Tian | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | VisionLLaMA:用于视觉任务的统一 LLaMA 接口 | Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching | 选择性立体声:用于立体声匹配的自适应频率信息选择 | Xianqi Wang, Gangwei Xu, Hao Jia, Xin Yang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization | RealCustom:缩小真实文本字的范围,实现实时开放域文本到图像的定制 | Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Task Indicating Transformer for Task-conditional Dense Predictions | 用于任务条件密集预测的任务指示变压器 | Yuxiang Lu, Shalayiding Sirejiding, Bayram Bayramli, Suizhi Huang, Yue Ding, Hongtao Lu | arxiv.org/pdf/2403.00… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | G3DR: Generative 3D Reconstruction in ImageNet | G3DR:ImageNet 中的生成 3D 重建 | Pradyumna Reddy, Ismail Elezi, Jiankang Deng | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Point Could Mamba: Point Cloud Learning via State Space Model | Point Could Mamba:通过状态空间模型进行点云学习 | Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, Shuicheng Yan | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Learning and Leveraging World Models in Visual Representation Learning | 在视觉表示学习中学习和利用世界模型 | Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training | 用于视觉语言预训练的语义增强跨模态掩模图像建模 | Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, et.al. | arxiv.org/pdf/2403.00… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | VisRec: A Semi-Supervised Approach to Radio Interferometric Data Reconstruction | VisRec:无线电干涉数据重建的半监督方法 | Ruoqi Wang, Haitao Wang, Qiong Luo, Feng Wang, Hejun Wu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning | 展平远程损失景观以实现跨域小样本学习 | Yixiong Zou, Yicong Liu, Yiman Hu, Yuhua Li, Ruixuan Li | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification | 用于无监督人员重新识别的空间级联聚类和加权记忆 | Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang | arxiv.org/pdf/2403.00… | null |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-01 | SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation | SELFI:通过强化学习进行社交导航的自主自我提升 | Noriaki Hirose, Dhruv Shah, Kyle Stachowicz, Ajay Sridhar, Sergey Levine | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Fine-tuning with Very Large Dropout | 使用非常大的压差进行微调 | Jianyu Zhang, Léon Bottou | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Fast and Efficient Local Search for Genetic Programming Based Loss Function Learning | 基于遗传编程的损失函数学习的快速高效的局部搜索 | Christian Raymond, Qi Chen, Bing Xue, Mengjie Zhang | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Hydra: Computer Vision for Data Quality Monitoring | Hydra:用于数据质量监控的计算机视觉 | Thomas Britton, Torri Jeske, David Lawrence, Kishansingh Rajput | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Advancing dermatological diagnosis: Development of a hyperspectral dermatoscope for enhanced skin imaging | 推进皮肤病诊断:开发用于增强皮肤成像的高光谱皮肤镜 | Martin J. Hetz, Carina Nogueira Garcia, Sarah Haggenmüller, Titus J. Brinker | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Flattening Singular Values of Factorized Convolution for Medical Images | 医学图像因式分解卷积的奇异值展平 | Zexin Feng, Na Zeng, Jiansheng Fang, Xingyue Wang, Xiaoxi Lu, Heng Meng, Jiang Liu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Multi-Task Learning Using Uncertainty to Weigh Losses for Heterogeneous Face Attribute Estimation | 多任务学习利用不确定性来权衡异质人脸属性估计的损失 | Huaqing Yuan, Yi He, Peng Du, Lu Song | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Relaxometry Guided Quantitative Cardiac Magnetic Resonance Image Reconstruction | 松弛测量引导定量心脏磁共振图像重建 | Yidong Zhao, Yi Zhang, Qian Tao | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability | 当 ControlNet 遇到不明确的掩模时:ControlNet 轮廓跟踪能力案例研究 | Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Deep Learning Computed Tomography based on the Defrise and Clack Algorithm | 基于 Defrise 和 Clack 算法的深度学习计算机断层扫描 | Chengze Ye, Linda-Sophie Schneider, Yipeng Sun, Andreas Maier | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Spatio-temporal reconstruction of substance dynamics using compressed sensing in multi-spectral magnetic resonance spectroscopic imaging | 多光谱磁共振波谱成像中利用压缩感知对物质动力学进行时空重建 | Utako Yamamoto, Hirohiko Imai, Kei Sano, Masayuki Ohzeki, Tetsuya Matsuda, Toshiyuki Tanaka | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | List-Mode PET Image Reconstruction Using Dykstra-Like Splitting | 使用 Dykstra 类分割的列表模式 PET 图像重建 | Kibo Ote, Fumio Hashimoto, Yuya Onishi, Yasuomi Ouchi | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning | 重新审视下游任务中的解缠结:对其抽象视觉推理必要性的研究 | Ruiqian Nai, Zixin Wen, Ji Li, Yuanzhi Li, Yang Gao | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Event-Driven Learning for Spiking Neural Networks | 尖峰神经网络的事件驱动学习 | Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Li | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Parameter-Efficient Tuning of Large Convolutional Models | 大型卷积模型的参数高效调整 | Wei Chen, Zichen Miao, Qiang Qiu | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART | 在自定义数据集和 mBART 上使用微调的 XLSR Wav2Vec2 进行视频转录和翻译 | Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra | arxiv.org/pdf/2403.00… | null |
| 2024-03-01 | ChartReformer: Natural Language-Driven Chart Image Editing | ChartReformer:自然语言驱动的图表图像编辑 | Pengyu Yan, Mahesh Bhosale, Jay Lal, Bikhyat Adhikari, David Doermann | arxiv.org/pdf/2403.00… | null |