知识增强的价值关键
预训练语言模型目前的瓶颈:
- 不可解释,黑盒模型
- 下游任务需要大量标注数据 虽然少样本中人工给出思维链提示的成本很小,但这种注释成本相对于微调还是令人望而却步(也可以用synthetic data generation合成数据生成, or zero-shot generalization零样本泛化来处理这个问题)。
- 推理能力差
ChatGPT和之前PLM的创新:
- 小样本提示学习和指令学习
- 思维链(Chain of Thought,COT)补充了逻辑推理过程(知识)给模型
- 基于人类反馈的强化学习(Reinforcement Learning with Human Feedback,RLHF)
- 训练数据中补充了过程性知识(代码)
ChatGPT在专业性强的问题上“一本正经的胡说八道”
- 提高结果的可信度【知识】
- 提高推理能力,改善模型的可解释性差的问题【知识】
一、技术研究:知识增强
【What】知识是什么?
-
从业务的角度:事实知识(陈述性知识)、机理知识(过程性知识)、数据知识
-
从研究的角度:“知识”有两种不同的分类方法[灰色色块为研究中常用的,可实现的数据类型] 2.1. 按照不同来源分类:内部知识、外部知识
2.2. 按照不同性质分类 [根据北大《人工智能原理》]
“可变性”:那么知识分为静态/动态;
“可理解性”:那么知识分为表层/深层;
“内容的性质”:那么知识分为陈述性/过程性;
...
【How】如何表征知识?
根据上面的分类,内部或外部知识通常用(1)实体(三元组中的实体)词典;(2)知识图谱;(3)纯文本直接作为补充知识;(4)与上下文有关系的图像。
确定性知识和不确定性知识
确定性知识:
通常用(1)语义网络;(2)知识图谱;(3)框架语言frame;(4)一阶逻辑;(5)命题逻辑;(6)模态逻辑;(7)描述逻辑;(8)本体...,他们分别由各自的使用情况和局限性。
不确定性知识:
不确定性知识指的是不精确(imprecise)、不完全(incomplete)、随机性(stochastic)的知识。 表示方法如下
过程性知识和陈述性知识
过程性知识描述“怎么做”的知识,描述解决问题的过程,通常也是从已有的知识中整理出来的规则,它具有动态的特征,不同情况不同任务下动态变化。 陈述性知识描述“是什么”的知识,往往是事实性知识,包括事物、事件、过程描述、属性、关系这些知识。
| 💡 ChatGPT引入了代码数据作为预训练数据 |
---KEPLM:引入Knowledge到PLM---
【How】(知识)增强/注入的方法
PLM的网络结构层有:Input、Embedding层、Encoder层 PLM的训练任务有:Masked Language 掩码任务和NSP下一句预测任务
💡这些地方都可以作为知识注入的切入口。
M1:修改Input
思路1:
- 在知识图谱中找对应的实体的三元组插入在input文本中
- 把对应的实体描述插入在input文本中
例子:
2个例子
思路2:
先把原始input文本组织成图结构,再和来自于知识图谱中的子图拼接(可以根据input中的实体词或者其他),构建成补充后的图结构,再重新展平成文本序列的形式作为PLM的输入。
2个例子
M2:在Encoder层中增加知识融合模块
(1) on top of the entire PLM:我们可以在n个Encoder整体之后增加一个知识融合模块; (2) between the Transformer layers of PLM:我们也可以单层的Encoder增加知识融合模块,这样n个Encoder就重复n次; (3) inside the Transformer layers of PLM:Encoder包含着很多子层比如多头自注意力层,feed forward层等等,我们也可以在这其中插入知识融合模块。
三个例子
在n个Encoder整体之后增加知识融合模块的例子:M3:增加或修改预训练任务
(1) 修改掩码任务
1个例子
(2) 增加知识相关的预训练任务
1个例子
【评估】如何评估KEPLM的好坏
过去,评价KEPLM的优劣通常通过它生成的Representation(表征)的理解能力好坏来定义。
现在以及以后,KEPLM的推理能力将是我们更关注的点,因为它们已经在GLUE/CLUE任务上表现得足够好了。
| 💡 ChatGPT引入了思维链CoT增强PLM的推理能力 |
Reference
- A Survey on Knowledge-Enhanced Pre-trained Language Models IEEE TRANS 2023.01
- A Survey of Knowledge Enhanced Pre-trained Models 2022.06
- A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models 2022.02
- 知识图谱构建技术综述 计算机工程 2022
- 新一代知识图谱关键技术综述 计算机研究与发展 2022
- A Survey on Knowledge Graphs: Representation, Acquisition and Applications IEEE transactions 2021
- CoLAKE: Contextualized Language and Knowledge Embedding ACL 2020
- KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Transactions of ACL 2021
- Reasoning About Knowledge 2003.01
- On Commonsense Cues in BERT for Solving Commonsense Tasks ACL-IJCNLP 2021 2021.08
- What does BERT learn about the structure of language? ACL 2019.07
- A Structural Probe for Finding Syntax in Word Representations NAACL-HLT 2019 2019.06
- A Closer Look at How Fine-tuning Changes BERT ACL 2022.03
- DirectProbe: Studying Representations without Classifiers ACL 2021.04
- Enhancing Self-Attention with Knowledge-Assisted Attention Maps NAACL 2022
- SKILL: Structured Knowledge Infusion for Large Language Models NAACL 2022
- KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation NAACL 2022
- Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning NAACL 2022
- KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning AAAI 2021.01
- Chain of Thought Prompting Elicits Reasoning in Large Language Models NeurIPS 2023.01
- Training Verifiers to Solve Math Word Problems 2021.09
- JAKET: Joint Pre-training of Knowledge Graph and Language Understanding AAAI 2021.03
- Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories EMNLP 2021.08
- Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation. AAAI 2021.03
- Entities as Experts: Sparse Memory Access with Entity Supervision EMNLP 2020
- ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation 2021
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts EMNLP 2020.08
- Semantics-aware BERT for Language Understanding AAAI 2020.05
- KALA: Knowledge-Augmented Language Model Adaptation 2022.04
- Knowledge-driven Natural Language Understanding of English Text and its Applications AAAI 2021
- Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers 2020
- KgPLM: Knowledge-guided Language Model Pre-training via Generative and Discriminative Learning. 2020
- KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning AAAI 2020.11
- SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis
- DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for Natural Language Understanding AAAI 2022
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM 2021.07
- CoLAKE: Contextualized Language and Knowledge Embedding ACL 2020.08
- LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention EMNLP 2020