声学事件分类
-
FedRPO:用于声学事件分类的联邦宽松帕累托优化 Meng Feng, Chieh-Chi Kao, Qingming Tang, Amit Solomon, Viktor Rozgic, Chao Wang
-
用于高效音频分类的多尺度音频频谱图变换器 Wentao Zhu, Mohamed Omar
-
基于变换器的少样本学习任务中的生物声学声音事件检测 Liwen You, Erika Pelaez Coyotl, Suren Gunturu, Maarten Van Segbroeck
-
跨设备约束下搜索专用声学事件分类网络的权重共享超网络 Guan-Ting Lin, Qingming Tang, Chieh-Chi Kao, Viktor Rozgic, Chao Wang
自动语音识别
-
基于图的标签传播的跨语句ASR重评分 Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Shally Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran
-
用于统一流式和非流式Conformer ASR的动态块卷积 Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati
-
利用外部离策略声学目录进行可扩展的上下文端到端自动语音识别领域自适应 David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister
-
神经转录器中用于选择性上下文偏置的门控上下文适配器 Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant Strimel, Feng-Ju (Claire) Chang, Ariya Rastrow, Nathan Susanj, Athanasios Mouchtaris
-
掩蔽偏差:通过内部语言模型估计提升基于CTC的ASR的领域自适应泛化能力 Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jason Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff
-
用于端到端ASR自适应的即时文本检索 Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko
-
语音识别神经转录器中鲁棒的声学和语义上下文偏置 Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant Strimel, Ross McGowan, Athanasios Mouchtaris
代码生成
-
对话式文本到SQL:探索前沿技术与未来挑战 Sree Hari Krishnan Parthasarathi, Lu Zeng, Dilek Hakkani-Tür
一个提出的文本到SQL系统包含三个部分:(a) 使用离散提示在多任务上进行协同任务处理;(b) 约束解码;(c) 结合查询计划模型和模式链接算法的N最佳列表重排序。
常识推理
- CLICKER:基于注意力的跨语言常识知识迁移 Ruolin Su, Zhongkai Sun, Sixing Lu, Chengyuan Ma, Chenlei Guo
持续学习
- 量化持续联邦学习中的灾难性遗忘 Christophe Dupuy, Jimit Majmudar, Jixuan Wang, Tanya Roosta, Rahul Gupta, Clement Chung, Jie Ding, Salman Avestimehr
端点检测
-
基于深度上下文多臂老虎机的自适应端点检测 Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh
-
迈向准确且实时的语音结束点估计 Yifeng Fan, Colin Vaz, Di He, Jahn Heymann, Viet Anh Trinh, Zhe Zhang, Venkatesh Ravichandran
关键词唤醒
-
用于语音识别中高效唤醒词检测的双注意力神经转录器 Saumya Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Anastasios Alexandridis, Grant Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann
-
用于设备端关键词唤醒的定点量化感知训练 Sashank Macha, Om Oza, Alex Escott, Francesco Caliva, Robbie Armitano, Santosh Kumar Cheekatmalla, Sree Hari Krishnan Parthasarathi, Yuzong Liu
-
基于轻量级变换器的关键词唤醒自监督语音表示学习 Chenyang Gao, Yue Gu, Francesco Caliva, Yuzong Liu
-
用于关键词唤醒的小足迹可瘦身网络 Zuhaib Akhtar, Mohammad Omar Khursheed, Dongsu Du, Yuzong Liu
语言学习
- 用于发音错误诊断的音素RNN-转录器 Daniel Zhang, Soumya Saha, Sarah Campbell
机器学习
-
先剪枝再蒸馏:基于重要性采样的数据集蒸馏 Anirudh Sundar, Gokce Keskin, Chander Chandak, I-Fan Chen, Pegah Ghahremani, Shalini Ghosh
-
点积注意力中偏置项的作用 Mahdi Namazifar, Devamanyu Hazarika, Dilek Hakkani-Tür
自然语言理解
-
蒸馏-量化-微调:利用大模型实现边缘设备上低功耗、高效的多语言NLU Pegah Kharazmi, Zhewei Zhao, Clement Chung, Samridhi Choudhary
-
金字塔动态推理:通过早期退出提升来加速推理 Ershad Banijamali, Pegah Kharazmi, Sepehr Eghbali, Jixuan Wang, Clement Chung, Samridhi Choudhary
个性化语音识别
-
用于个性化语音识别的对话行为引导上下文适配器 Feng-Ju (Claire) Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant Strimel, Ross McGowan
-
PROCTER:神经转录器中用于个性化语音识别的发音感知上下文适配器 Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko
-
使用神经转录器的基于槽位触发的个性化语音识别上下文偏置 Sibo Tong, Philip Harding, Simon Wiesler
查询重写
- KG-ECO:基于知识图谱增强的查询重写实体纠正 Jason Cai, Mingda Li, Ziyan Jiang, Eunah Cho, Zheng Chen, Yang Liu, Xing Fan, Chenlei Guo
自学习
-
用于语音识别的弱监督联邦自学习 Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo
-
通过错误检测、归因和再训练实现自我修复 Ansel MacLaughlin, Anna Rumshisky, Rinat Khaziev, Anil Ramakrishna, Yuval Merhav, Rahul Gupta
信号处理
-
一个用于统一实时个性化和非个性化语音增强的框架 Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis
-
用于人类活动识别的增强鲁棒自监督学习 Cong Xu, Yuhang Li, Dae Lee, Andrew Park, Hongda Mao, Huyen Do, Jonathan Chung, Dinesh Nair
-
基于生成模型流形学习的自适应滤波引导 Karim Helwani, Paris Smaragdis, Michael M. Goodwin
-
SPADE:用于声学解耦的自监督预训练 John Harvill, Jarred Barber, Arun Nair, Ramin Pishehvar
口语语言理解
-
结合联合CTC损失与自监督预训练声学编码器的端到端口语语言理解 Jixuan Wang, Martin Radfar, Kai Wei, Clement Chung
-
探索端到端语音模型中的子组性能 Alkis Koudounas, Eliana Pastor, Giuseppe Attanasio, Vittorio Mazzia, Manuel Giollo, Thomas Gueudre, Luca Cagliero, Luca de Alfaro, Elena Baralis, Daniele Amberti
-
用于超低功耗应用的多语言端到端口语语言理解 Markus Mueller, Anastasios Alexandridis, Zach Trozenski, Joel Whiteman, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann
文本转语音
-
Framewise WaveGAN:时域中超低计算复杂度的快速对抗声码器 Ahmed Mustafa, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Mike Goodwin
-
无需特定口音TTS前端建模低资源口音 Georgi Tinchev, Marta Czarnowska, Kamil Deja, Kayoko Yanagisawa, Marius Cotescu
视频
-
VideoModEFormer:使用变换器进行音视频同步的模态保持嵌入 Akash Gupta, Rohun Tripathi, Wondong Jang
-
用于视频表示学习的多尺度组合约束 Georgios Paraskevopoulos, Chandrashekhar Lavania, Lovish Chum, Shiva Sundaram
语音通信
- 使用率失真优化变分自编码器的低比特率语音冗余编码 Jean-Marc Valin, Jan Buethe, Ahmed MustafaFINISHED