NLP项目——比较文本归纳模型

1,457 阅读32分钟

简介

在这篇文章中,我们将介绍文本总结的基础知识,生成自动摘要的不同方法,文本总结的一些现实应用,最后,我们将在ROUGE的帮助下比较各种文本总结模型,ROUGE是一套用于评估自动文本总结的指标,在Python中。

什么是文本归纳?

文本总结是将一篇长的文本,如文章、散文或研究论文,缩短为一个摘要的过程,通过保留关键信息和舍弃不重要的部分来传达文本的总体意义。当涉及到生成自动摘要时,有两种广泛的方法,即:

  1. 抽取式总结
  2. 抽象式总结

提取式总结

提取式总结模型将几个相关的、含有信息的句子完全按照源材料中的内容连接起来,以创建简短的总结。这些模型不会生成源材料中不存在的任何文本。今天使用的大多数总结系统恰好都是提取式的。

抽象式总结

抽象式总结模型创建的总结能够传达源材料中的主要信息。某些短语和句子可能会从源材料中重复使用,但整个摘要通常是重新措辞,用不同的词来写。摘要中的句子不一定出现在原始文本块中。

抽象的总结模型通常需要更多的计算能力。这是因为它们需要生成语法和语境完整的句子,这些句子与被提及的领域有关。该模型首先要彻底理解源材料,以便它能够有效地、有意义地进行总结。这就是为什么今天使用的大多数总结系统恰好都是提取式的。提取式总结模型只需要专注于实现一个目标:识别需要成为总结一部分的重要句子。抽象式总结模型必须考虑到更多的细节,才能生成一个适当的总结。

在我们开始之前,让我们看看我们将如何评估我们自动生成的摘要。

使用ROUGE指标进行评估

ROUGE是Recall-Oriented Understudy for Gisting Evaluation的缩写,是一个用来获取候选摘要(自动生成的摘要)和目标摘要(手写的参考摘要)之间相似程度的指标。ROUGE分数分为ROUGE-1、ROUGE-2和ROUGE-L分数。

ROUGE-1

ROUGE-1比较了自动生成的摘要和手写的摘要中单字的相似程度。在这种情况下,单字是指单词。因此,精确度和召回率可以通过评估从源材料到最终自动生成的摘要中捕获多少个单独的词来计算。

如果我们有一个候选句子:"他去了公园"

这个句子可以被表达为一个单字标记的列表:
"他","去","到","公园
"。

假设我们有一个参考句子:"他昨天去了公园"

这个句子包含以下单片语标记:
'他'、'去'、'到'、'公园'、'昨天'

现在,我们来看看候选句子所包含的所有单片段标记:
'他'、'去'、'到'、'公园
'。

因此,ROUGE-1的精确度可以计算为:
(捕获的单字标记的数量)÷(候选单字标记的数量

这样我们就得到了5÷5=1

此外,ROUGE-1的召回率可以计算为:
(捕获的单字标记的数量)÷(参考单字标记的数量

这样我们就得到了5÷6=0.83

ROUGE-1的F-Score可以计算为:
2 x (精确率x召回率) ÷ (精确率+召回率)

这就得出2 x (1 x 0.83) ÷ (1 + 0.83) = 0.907

ROUGE-2

ROUGE-1比较了自动生成的摘要和手写的摘要中大词的相似程度。在这种情况下,大词是指两个连续的词。因此,精确度和召回率可以通过评估从源材料到最终自动生成的摘要中捕获多少个大词来计算。

如果我们有一个候选句子:"他喜欢去公园"

这个句子可以被表达为一个大词标记的列表:
"他喜欢","喜欢去","去","去","公园

如果我们有一个参考句子:"他真的喜欢去公园"

这个句子包含以下大词标记:
'他真的','真的喜欢','喜欢去','去',
'到','公园'

现在,我们看一下候选句子所包含的所有大词标记:
'喜欢去','去','到','公园
'

因此,ROUGE-2的精确度可以计算为:
(捕获的大词标记的数量)÷(候选大词标记的数量

这样我们就得到了4÷5=0.8

此外,ROUGE-2的召回率可以计算为:
(捕获的大词标记数)÷(参考大词标记数)

这样我们就得到了4÷6=0.66

ROUGE-2的F-Score可以计算为:
2 x (精确率x召回率) ÷ (精确率+召回率)

因此,我们可以得出2 x (0.8 x 0.66) ÷ (0.8 + 0.66) = 0.723

ROUGE-L

ROUGE-L测量最长的共同子序列(LCS)。这指的是碰巧出现在序列中的词,而不考虑任何妨碍匹配序列的不同词(在比较候选句和参考句时)。

例如,如果我们有一个候选句子:"我带了一把伞去动物园,以防下雨"

这个句子包含以下标记:
"我","带着","一把","伞","到","动物园","在","如果","它",
"下雨

如果我们有一个参考句子:"我带了一把伞去动物园,因为它可能会下雨"

这个句子包含以下标记:
'我','拿着','一把','伞','去','动物园','因为','它','可能','有','
下雨'

现在,我们看看所有捕获的标记:
'I', 'an', 'umbrella', 'to', 'the', 'zoo', 'it', 'rained
' 。

因此,ROUGE-L的精确度可以计算为:
(捕获的标记数量)÷(候选标记的数量

这样我们就得到了8÷11=0.72

而ROUGE-L召回率可以计算为:
(捕获的标记数量)÷(参考标记数量

因此,我们可以得出8÷12=0.66

ROUGE-L F-Score的计算方法是:
2 x (精确度x召回率) ÷ (精确度+召回率)

这样我们就得到了2 x (0.72 x 0.66) ÷ (0.72 + 0.66) = 0.688

文本归纳模型的清单

现在我们知道了总结模型的两大类,以及我们将用来给自动生成的摘要打分的评价指标,让我们看看我们将在本文中比较的不同模型。

  1. 卢恩的启发式方法
  2. 文本排序法
  3. 潜在语义分析(LSA)
  4. 库尔贝克-莱布勒和(KL-Sum)
  5. T5转化器模型

卢恩启发式方法

Luhn's Heuristic Method for Text Summarization是最早的文本总结算法之一,发表于1958年。它基于TF-IDF(术语频率-反向文档频率),并根据其出现的频率来选择高重要性的词。此外,在文件的开头出现的词被赋予更高的权重。

现在让我们来看看我们将使用上述所有文本总结模型来总结的文件。

"意识,最简单地说,就是对内部和外部存在的知觉或意识。尽管哲学家和科学家们进行了数千年的分析、定义、解释和辩论,但意识仍然令人费解和有争议,它 "既是我们生活中最熟悉的,也是最神秘的方面"。也许关于这个话题唯一被广泛认同的概念是意识存在的直觉。对于究竟什么是需要研究和解释的意识,众说纷纭。有时,它是心灵的同义词,有时则是心灵的一个方面。在过去,它是一个人的 "内在生活",是内省的世界,是私人思想、想象和意志。今天,它通常包括任何种类的认知、经验、感觉或感知。它可能是意识,对意识的意识,或自我意识,无论是持续变化还是不变化。意识可能有不同的层次或顺序,或不同种类的意识,或只是一种具有不同特征的意识。其他问题包括是否只有人类有意识,所有动物,甚至整个宇宙都有意识。各种不同的研究、概念和猜测让人怀疑是否在问正确的问题。各种描述、定义或解释的例子有:简单的清醒,一个人的自我意识或通过'内视'探索的灵魂;是一种隐喻的'流'的内容,或者是大脑的精神状态、精神事件或精神过程;具有phanera或qualia和主观性;是'拥有'或'成为'它的'东西';是'内部剧院'或心灵的执行控制系统。"

正如我们所看到的,这份文件包含了多个句子,而 "重要 "的句子并不一定很容易与那些只是让我们对围绕意识的历史和意识形态有一个简单了解的句子区分开。现在我们将使用Python,用Luhn的启发式方法(Sumy库)对上述文件生成一个自动摘要。

首先,我们安装Sumy库。

pip install sumy

接下来,我们导入必要的包。

import sumy
import nltk
nltk.download('punkt')
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser

然后,我们定义我们的源材料(上述关于 "意识 "的文件将作为我们的源材料)。

source_material = "Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. Perhaps the only widely agreed notion about the topic is the intuition that consciousness exists. Opinions differ about what exactly needs to be studied and explained as consciousness. Sometimes, it is synonymous with the mind, and at other times, an aspect of mind. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. It may be awareness, awareness of awareness, or self-awareness either continuously changing or not. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked. Examples of the range of descriptions, definitions or explanations are: simple wakefulness, one's sense of selfhood or soul explored by 'looking within'; being a metaphorical 'stream' of contents, or being a mental state, mental event or mental process of the brain; having phanera or qualia and subjectivity; being the 'something that it is like' to 'have' or 'be' it; being the 'inner theatre' or the executive control system of the mind."

然后,我们定义我们将传递给Tokenizer的语言,并创建一个分析器。

LANGUAGE = "english"
parser = PlaintextParser.from_string(source_material,Tokenizer(LANGUAGE))

然后,我们创建总结器。

summarizer = LuhnSummarizer()

现在,我们将使用我们的总结器生成一个自动总结。我们将限制句子的数量,以便获得一个大约100字的摘要。

testsummary = summarizer(parser.document,sentences_count=3)

由于'testummary'是一个包含多个句子的元组,我们将不得不把这些句子连接起来,形成一个单数字符串。这将使我们有可能使用ROUGE来评估这个字符串。

summary = ""
for sentence in testsummary:
  summary+=str(sentence)
print(summary)

我们得到以下输出。
Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features. Examples of the range of descriptions, definitions or explanations are: simple wakefulness, one's sense of selfhood or soul explored by 'looking within'; being a metaphorical 'stream' of contents, or being a mental state, mental event or mental process of the brain; having phanera or qualia and subjectivity; being the 'something that it is like' to 'have' or 'be' it; being the 'inner theatre' or the executive control system of the mind.

正如我们所看到的,我们得到了一个由我们的总结器选出的三个最重要/最重要的句子组成的输出,然而,正如我们从下面的手写总结(我们将用它来比较我们所有的自动生成的总结)所看到的,上述总结确实遗漏了源材料所传达的一些关键信息。

现在让我们用ROUGE指标来评估我们自动生成的摘要。

我们安装 "Rouge "库。

!pip install rouge

然后我们安装必要的软件包。

from rouge import Rouge

我们定义手写的、参考的摘要。

reference = "Consciousness is essentially the awareness of one's internal and external existence. Despite seeming like a fairly trivial concept, the only notion that seems to be widely agreed upon after millenia of theorizing and debating is the fact that consciousness exists. In the past, consciousness was perceived as one's inner life, the world of introspection, of private thought, imagination and volition. Today, this definition includes any kind of cognition, experience, feeling or perception. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked."

然后,我们将自动生成的摘要与参考摘要进行比较。

ROUGE = Rouge()
ROUGE.get_scores(summary,reference)

在执行上述语句时,我们得到以下输出。
[{'rouge-1': {'r': 0.2054794520547945,
'p': 0.18292682926829268,
'f': 0.19354838211363173},
'rouge-2': {'r': 0.011235955056179775,
'p': 0.008620689655172414,
'f': 0.00975609264771217},
'rouge-l': {'r': 0.1780821917808219,
'p': 0.15853658536585366,
'f': 0.16774193050072858}}]

因此,我们的ROUGE-1得分如下:
F1 - 0.19354838211363173
精度 - 0.18292682926829268
召回 - 0.2054794520547945

我们的ROUGE-2得分如下:
F1 - 0.00975609264771217
精度 - 0.008620689655172414
召回 - 0.011235955056179775

而我们的ROUGE-L得分如下:
F1 - 0.16774193050072858
精度 - 0.15853658536585366
召回 - 0.1780821917808219

从我们的ROUGE分数可以看出,与手写的参考文献摘要相比,我们自动生成的摘要得分并不高。让我们来看看下一个文本总结模型,TextRank。

TextRank

TextRank是一种基于图形的抽取式文本总结技术。它被用来寻找一段文本中最相关的句子(以及关键词)。这里,包含高频率词汇的句子被认为是重要的。因此,该算法为源材料中的每个句子分配分数。然后,这些句子按其分数的降序排列,得分最高的句子被列入摘要中。现在让我们使用TextRank生成一个自动摘要。

首先,我们安装Gensim库。

!pip install gensim

接下来,我们导入必要的包。

import gensim
from gensim.summarization import summarize

现在,我们定义我们的源材料。

source_material = "Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. Perhaps the only widely agreed notion about the topic is the intuition that consciousness exists. Opinions differ about what exactly needs to be studied and explained as consciousness. Sometimes, it is synonymous with the mind, and at other times, an aspect of mind. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. It may be awareness, awareness of awareness, or self-awareness either continuously changing or not. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked. Examples of the range of descriptions, definitions or explanations are: simple wakefulness, one's sense of selfhood or soul explored by 'looking within'; being a metaphorical 'stream' of contents, or being a mental state, mental event or mental process of the brain; having phanera or qualia and subjectivity; being the 'something that it is like' to 'have' or 'be' it; being the 'inner theatre' or the executive control system of the mind."

现在,我们将源材料传递给'summaryize'函数。

summary = summarize(source_material,word_count=100)

最后,我们打印自动生成的摘要。

print(summary)

我们得到以下输出。
Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. Perhaps the only widely agreed notion about the topic is the intuition that consciousness exists. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked.

正如我们所看到的,总结包含了整个源材料中的观点,包括意识的基本定义,意识似乎是一个微不足道的概念,但也是非常未被探索和广阔的,最后还包括今天关于意识的问题。

现在让我们用ROUGE指标来评估我们自动生成的摘要。

我们安装 "Rouge "库。

!pip install rouge

然后我们安装必要的软件包。

from rouge import Rouge

我们定义手写的、参考的摘要。

reference = "Consciousness is essentially the awareness of one's internal and external existence. Despite seeming like a fairly trivial concept, the only notion that seems to be widely agreed upon after millenia of theorizing and debating is the fact that consciousness exists. In the past, consciousness was perceived as one's inner life, the world of introspection, of private thought, imagination and volition. Today, this definition includes any kind of cognition, experience, feeling or perception. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked."

然后我们将我们自动生成的摘要与参考摘要进行比较。

ROUGE = Rouge()
ROUGE.get_scores(summary,reference)

在执行上述语句时,我们得到以下输出。
[{'rouge-1': {'f': 0.45070422035608015,
'p': 0.463768115942029,
'r': 0.4383561643835616},
'rouge-2': {'f': 0.29999999500061736,
'p': 0.2967032967032967,
'r': 0.30337078651685395},
'rouge-l': {'f': 0.43661971331382665,
'p': 0.4492753623188406,
'r': 0.4246575342465753}}]

因此,我们的ROUGE-1得分如下:
F1 - 0.45070422035608015
精度 - 0.463768115942029
召回 - 0.4383561643835616

我们的ROUGE-2得分如下:
F1 - 0.299999500061736
精度 - 0.2967032967032967
召回 - 0.30337078651685395

而我们的ROUGE-L得分如下:
F1 - 0.43661971331382665
精度 - 0.4492753623188406
召回 - 0.4246575342465753

在上述总结中,我们将字数限制在100个,但如果需要,也可以改变。现在让我们来看看下一个文本总结模型,即潜在语义分析(LSA)模型。

潜在语义分析(LSA)

潜在语义分析是一种无监督的自然语言处理(NLP)技术,它使用统计学的方法,根据上下文的使用情况,提取文件中的词语之间的关联。其目的是要从源材料中找出最重要的主题,然后选择在各主题中具有最大综合权重的句子。奇异值分解(SVD)是一种统计技术,用于揭示源材料中词语的隐藏语义结构。现在让我们使用潜在语义分析来生成一个自动摘要。

首先,我们安装Sumy库。

!pip install sumy

接下来,我们导入必要的包。

import sumy
import nltk
nltk.download('punkt')
from sumy.summarizers.lsa import LsaSummarizer
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser

现在,我们定义我们的源材料。

source_material = "Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. Perhaps the only widely agreed notion about the topic is the intuition that consciousness exists. Opinions differ about what exactly needs to be studied and explained as consciousness. Sometimes, it is synonymous with the mind, and at other times, an aspect of mind. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. It may be awareness, awareness of awareness, or self-awareness either continuously changing or not. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked. Examples of the range of descriptions, definitions or explanations are: simple wakefulness, one's sense of selfhood or soul explored by 'looking within'; being a metaphorical 'stream' of contents, or being a mental state, mental event or mental process of the brain; having phanera or qualia and subjectivity; being the 'something that it is like' to 'have' or 'be' it; being the 'inner theatre' or the executive control system of the mind."

然后,我们定义我们将传递给标记器的语言,并创建一个分析器。

LANGUAGE = "english"
parser = PlaintextParser.from_string(source_material,Tokenizer(LANGUAGE))

然后,我们创建总结器。

summarizer = LsaSummarizer()

现在,我们将使用我们的总结器生成一个自动总结。我们将限制句子的数量,以便获得一个大约100字的摘要。

testsummary = summarizer(parser.document,sentences_count=4)

由于'testummary'是一个包含多个句子的元组,我们将不得不把这些句子连接起来,形成一个单数字符串。这将使我们有可能使用ROUGE来评估这个字符串。

summary = ""`
for sentence in testsummary:
  summary+=str(sentence)
print(summary)

我们得到了以下输出。
Consciousness, at its simplest, is sentience or awareness of internal and external existence. Opinions differ about what exactly needs to be studied and explained as consciousness. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked.

正如我们所看到的,该摘要包含了源材料中的各种关键点,包括过去对意识的看法以及它们与最近的定义有何不同,正在提出的关于意识的问题的种类,以及正在提出的问题是否正确。

现在让我们用ROUGE指标来评估我们自动生成的摘要。

我们安装 "Rouge "库。

!pip install rouge

然后我们安装必要的软件包。

from rouge import Rouge

我们定义手写的、参考的摘要。

reference = "Consciousness is essentially the awareness of one's internal and external existence. Despite seeming like a fairly trivial concept, the only notion that seems to be widely agreed upon after millenia of theorizing and debating is the fact that consciousness exists. In the past, consciousness was perceived as one's inner life, the world of introspection, of private thought, imagination and volition. Today, this definition includes any kind of cognition, experience, feeling or perception. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked."

然后,我们将自动生成的摘要与参考摘要进行比较。

ROUGE = Rouge()
`ROUGE.get_scores(summary,reference)

在执行上述语句时,我们得到了以下输出。
[{'rouge-1': {'r': 0.6438356164383562,
'p': 0.6527777777777778,
'f': 0.6482758570692034},
'rouge-2': {'r': 0.47191011235955055,
'p': 0.4772727272727273,
'f': 0.4745762661866003},
'rouge-l': {'r': 0.6301369863013698,
'p': 0.6388888888888888,
'f': 0.6344827536209275}}]

因此,我们的ROUGE-1得分如下:
F1 - 0.6482758570692034
精度 - 0.652777777778
召回 - 0.6438356164383562

我们的ROUGE-2得分如下:
F1 - 0.4745762661866003
精度 - 0.477272727273
召回 - 0.47191011235955055

我们的ROUGE-L得分如下:
F1 - 0.6344827536209275
精度 - 0.6388888888
召回 - 0.6301369863013698

从我们的ROUGE分数可以看出,与我们手写的参考摘要相比,我们自动生成的摘要到目前为止得分最高。现在让我们来看看下一个文本总结模型,Kullback-Leibler Sum(KL-Sum)模型。

Kullback-Leibler Sum

在数学统计学领域,Kullback-Leibler分歧,也经常被称为相对熵,是一种统计距离,用来衡量一个概率分布 "P "与一个参考概率分布 "Q "相比有多大差别。它与源材料和自动生成的摘要之间的相似程度成反比(在可读性和传达的信息方面)。Kullback-Leibler Sum算法是一种贪婪的方法,只要Kullback-Leibler分歧在减少,就通过添加句子来创建摘要。这可以确保摘要包含一组恰好与文档集单字分布相似的句子。现在让我们利用上述原则生成一个自动摘要。

首先,我们安装Sumy库。

!pip install sumy

接下来,我们导入必要的包。

import sumy
import nltk
nltk.download('punkt')
from sumy.summarizers.kl import KLSummarizer
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser

现在,我们定义我们的源材料。

source_material = "Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. Perhaps the only widely agreed notion about the topic is the intuition that consciousness exists. Opinions differ about what exactly needs to be studied and explained as consciousness. Sometimes, it is synonymous with the mind, and at other times, an aspect of mind. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. It may be awareness, awareness of awareness, or self-awareness either continuously changing or not. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked. Examples of the range of descriptions, definitions or explanations are: simple wakefulness, one's sense of selfhood or soul explored by 'looking within'; being a metaphorical 'stream' of contents, or being a mental state, mental event or mental process of the brain; having phanera or qualia and subjectivity; being the 'something that it is like' to 'have' or 'be' it; being the 'inner theatre' or the executive control system of the mind."

然后,我们定义我们将传递给Tokenizer的语言,并创建一个分析器。

LANGUAGE = 'english'
parser = PlaintextParser.from_string(source_material,Tokenizer(LANGUAGE))

然后,我们创建总结器。

summarizer = KLSummarizer()

现在,我们将使用我们的总结器生成一个自动总结。我们将限制句子的数量,以便获得一个大约100字的摘要。

testsummary = summarizer(parser.document,sentences_count=6)

由于'testummary'是一个包含多个句子的元组,我们将不得不把这些句子连接起来,形成一个单数字符串。这将使我们有可能使用ROUGE来评估这个字符串。

summary = ""
for sentence in testsummary:
  summary+=str(sentence)
print(summary)

我们获得以下输出。
Consciousness, at its simplest, is sentience or awareness of internal and external existence. Opinions differ about what exactly needs to be studied and explained as consciousness. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. It may be awareness, awareness of awareness, or self-awareness either continuously changing or not. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features.

正如我们所看到的,虽然该摘要确实掩盖了原始材料中所传达的关键点,但它并没有提到今天关于意识的问题,以及这些问题是否正确。因此,当我们将其与手写的参考文献摘要进行评估时,我们很可能会看到ROUGE的分数出现下滑。

现在让我们用ROUGE指标来评估我们自动生成的摘要。

我们安装 "Rouge "库。

!pip install rouge

然后我们安装必要的软件包。

from rouge import Rouge

我们定义手写的参考摘要。

reference = "Consciousness is essentially the awareness of one's internal and external existence. Despite seeming like a fairly trivial concept, the only notion that seems to be widely agreed upon after millenia of theorizing and debating is the fact that consciousness exists. In the past, consciousness was perceived as one's inner life, the world of introspection, of private thought, imagination and volition. Today, this definition includes any kind of cognition, experience, feeling or perception. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked."

然后,我们将自动生成的摘要与参考摘要进行比较。

ROUGE = Rouge()
ROUGE.get_scores(summary,reference)

在执行上述语句时,我们得到以下输出。
[{'rouge-1': {'r': 0.4383561643835616,
'p': 0.47761194029850745,
'f': 0.4571428521520408},
'rouge-2': {'r': 0.2808988764044944,
'p': 0.28735632183908044,
'f': 0.2840909040915548},
'rouge-l': {'r': 0.4246575342465753,
'p': 0.4626865671641791,
'f': 0.4428571378663266}}]

因此,我们的ROUGE-1得分如下:
F1 - 0.4571428521520408
精度 - 0.47761194029850745
召回 - 0.4383561643835616

我们的ROUGE-2得分如下:
F1 - 0.2840909040915548
精度 - 0.28735632183908044
召回 - 0.2808988764044944

而我们的ROUGE-L得分如下:
F1 - 0.44285713786266
精度 - 0.4626865671641791
召回 - 0.4246575342465753

从我们的ROUGE分数可以看出,上述总结的得分没有我们的潜在语义分析模型所产生的总结高。它漏掉了手写参考摘要中的一个关键点。现在让我们来看看下一个文本总结模型,它恰好是一个抽象的文本总结模型,即T5转化器模型。

T5转化器模型

变形器是一种神经网络架构,是由谷歌(和UoT)的一组研究人员在2017年开发的。它们避免使用递归原则,完全依靠注意力机制来绘制输入和输出之间的全局依赖关系。变换器允许比顺序模型更多的并行化,即使只经过短时间的训练,也能达到非常高的翻译质量。它们也可以在非常大的数据量上进行训练,而没有那么多困难。在此阅读更多关于 "使用变形器进行文本总结 "的信息。

T5转化器模型(由谷歌人工智能在2020年开发)是一个编码器-解码器模型,在执行自然语言处理(NLP)任务时可以达到最先进的结果,同时也足够灵活,可以针对更具体的问题进行微调。它将所有这类任务框定为文本到文本的格式,其中输入和输出始终是字符串。现在让我们使用HuggingFace的T5转化器模型生成一个抽象的自动摘要。

首先,我们安装必要的库。

!pip install transformers
!pip install sentencepiece

注意:一旦SentencePiece被安装,我们的内核需要重新启动,以便进一步的代码行能够成功运行。

接下来,我们导入必要的包。

import torch
import json
from transformers import T5Tokenizer, T5Config, T5ForConditionalGeneration

现在,我们定义我们的源材料。

source_material = "Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions, explanations and debates by philosophers and scientists, consciousness remains puzzling and controversial, being 'at once, the most familiar and also the most mysterious aspect of our lives'. Perhaps the only widely agreed notion about the topic is the intuition that consciousness exists. Opinions differ about what exactly needs to be studied and explained as consciousness. Sometimes, it is synonymous with the mind, and at other times, an aspect of mind. In the past, it was one's 'inner life', the world of introspection, of private thought, imagination and volition. Today, it often includes any kind of cognition, experience, feeling or perception. It may be awareness, awareness of awareness, or self-awareness either continuously changing or not. There might be different levels or orders of consciousness, or different kinds of consciousness, or just one kind with different features. Other questions include whether only humans are conscious, all animals, or even the whole universe. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked. Examples of the range of descriptions, definitions or explanations are: simple wakefulness, one's sense of selfhood or soul explored by 'looking within'; being a metaphorical 'stream' of contents, or being a mental state, mental event or mental process of the brain; having phanera or qualia and subjectivity; being the 'something that it is like' to 'have' or 'be' it; being the 'inner theatre' or the executive control system of the mind."

确保我们的CPU被利用来运行我们的代码。

device = torch.device('cpu')

现在,我们定义我们的模型并创建一个标记器。在我们的案例中,我们将使用一个预先训练好的模型来抽象地总结我们的源材料。

summarizer = T5ForConditionalGeneration.from_pretrained('t5-base')
tokenizer = T5Tokenizer.from_pretrained('t5-base')

为了使我们的模型能够总结我们的源材料,我们在文本的开头添加关键词 "summaryize:"。

updated_material = "summarize:" + source_material

我们现在使用我们的标记器对更新的材料进行编码。

tokenized_material = tokenizer.encode(updated_material, return_tensors="pt").to(device)

当我们打印时,这就是我们编码后的更新材料的样子。

print(tokenized_material)

输出。
tensor([[21603, 10, 4302, 7, 75, 2936, 655, 6, 44, 165,
3, 21120, 6, 19, 1622, 23, 1433, 42, 4349, 13,
3224, 11, 3866, 6831, 5, 3, 4868, 3293, 35, 29,
23, 9, 13, 15282, 6, 4903, 7, 6, 7295, 7,
11, 5054, 7, 57, 25857, 7, 11, 7004, 6, 13645,
3048, 4353, 5271, 697, 11, 15202, 6, 271, 3, 31,
144, 728, 6, 8, 167, 3324, 11, 92, 8, 167,
15124, 2663, 13, 69, 1342, 31, 5, 5632, 8, 163,
5456, 4686, 9347, 81, 8, 2859, 19, 8, 26207, 24,
13645, 8085, 5, 411, 22441, 7, 7641, 81, 125, 1776,
523, 12, 36, 7463, 11, 5243, 38, 13645, 5, 3921,
6, 34, 19, 30141, 28, 8, 809, 6, 11, 44,
119, 648, 6, 46, 2663, 13, 809, 5, 86, 8,
657, 6, 34, 47, 80, 31, 7, 3, 31, 77,
687, 280, 31, 6, 8, 296, 13, 16, 30113, 106,
6, 13, 1045, 816, 6, 9675, 11, 5063, 4749, 5,
1960, 6, 34, 557, 963, 136, 773, 13, 23179, 4749,
6, 351, 6, 1829, 42, 8136, 5, 94, 164, 36,
4349, 6, 4349, 13, 4349, 6, 42, 1044, 18, 9,
3404, 655, 893, 11721, 2839, 42, 59, 5, 290, 429,
36, 315, 1425, 42, 5022, 13, 13645, 6, 42, 315,
4217, 13, 13645, 6, 42, 131, 80, 773, 28, 315,
753, 5, 2502, 746, 560, 823, 163, 6917, 33, 13381,
6, 66, 3127, 6, 42, 237, 8, 829, 8084, 5,
37, 8378, 342, 620, 13, 585, 6, 9347, 7, 11,
22547, 7, 3033, 7, 3228, 7, 81, 823, 8, 269,
746, 33, 271, 1380, 5, 19119, 13, 8, 620, 13,
15293, 6, 4903, 7, 42, 7295, 7, 33, 10, 650,
7178, 18154, 6, 80, 31, 7, 1254, 13, 1044, 4500,
42, 3668, 15883, 57, 3, 31, 10119, 441, 31, 117,
271, 3, 9, 21253, 1950, 3, 31, 8103, 31, 13,
10223, 6, 42, 271, 3, 9, 2550, 538, 6, 2550,
605, 42, 2550, 433, 13, 8, 2241, 117, 578, 3,
8237, 1498, 42, 546, 5434, 11, 1426, 10696, 117, 271,
8, 3, 31, 23180, 24, 34, 19, 114, 31, 12,
3, 31, 7965, 31, 42, 3, 31, 346, 31, 34,
117, 271, 8, 3, 31, 77, 687, 8516, 31, 42,
8, 4297, 610, 358, 13, 8, 809, 5, 1]])

我们现在生成我们的摘要。

tokenized_summary = summarizer.generate(tokenized_material,
                                    num_beams=5,
                                    no_repeat_ngram_size=2,
                                    min_length=100,
                                    max_length=120,
                                    early_stopping=True)

在这里,我们为我们的摘要引入了限制条件,因为我们不希望大小为2的ngrams重复出现,而且我们希望得到一个长度在100到120个字之间的摘要。

由于我们在执行上述一行代码后得到的摘要仍然是标记化的,或者说是编码化的,它看起来是这样的。

print(tokenized_summary)

输出。
tensor([[ 0, 13645, 6, 44, 165, 3, 21120, 6, 19, 1622,
23, 1433, 42, 4349, 13, 3224, 11, 3866, 6831, 3,
5, 3, 3565, 3293, 35, 29, 23, 9, 13, 15282,
6, 4903, 7, 11, 5054, 7, 6, 13645, 3048, 4353,
5271, 697, 11, 15202, 6, 271, 3, 31, 532, 167,
3324, 11, 92, 8, 167, 15124, 2663, 13, 69, 1342,
31, 16, 8, 657, 6, 34, 47, 80, 31, 7,
4723, 280, 6, 8, 296, 13, 16, 30113, 106, 6,
13, 1045, 816, 6, 9675, 11, 5063, 4749, 5, 469,
34, 557, 963, 136, 773, 13, 23179, 4749, 6, 351,
6, 1829, 42, 8136, 5, 132, 429, 36, 315, 1425,
42, 5022, 13, 13645, 1]])

现在,我们将对上述标记化的摘要进行解码,得到一个文本摘要。

summary = tokenizer.decode(tokenized_summary[0], skip_special_tokens=True)

最后,我们打印我们自动生成的摘要。

print(summary)

我们获得以下输出。
Consciousness, at its simplest, is sentience or awareness of internal and external existence. Despite millennia of analyses, definitions and debates, consciousness remains puzzling and controversial, being 'the most familiar and also the most mysterious aspect of our lives'. In the past, it was one's inner life, the world of introspection, of private thought, imagination and volition. Today it often includes any kind of cognition, experience, feeling or perception. There might be different levels or orders of consciousness.

我们可以看到,与我们的源材料相比,我们得到的摘要是用不同的词写成的。这是因为T5转化器模型是一个抽象的模型。

现在让我们用ROUGE指标来评估我们自动生成的摘要。

我们安装 "Rouge "库。

!pip install rouge

然后我们安装必要的软件包。

from rouge import Rouge

我们定义手写的、参考的摘要。

reference = "Consciousness is essentially the awareness of one's internal and external existence. Despite seeming like a fairly trivial concept, the only notion that seems to be widely agreed upon after millenia of theorizing and debating is the fact that consciousness exists. In the past, consciousness was perceived as one's inner life, the world of introspection, of private thought, imagination and volition. Today, this definition includes any kind of cognition, experience, feeling or perception. The disparate range of research, notions and speculations raises doubts about whether the right questions are being asked."

然后,我们将我们自动生成的摘要与参考摘要进行比较。

ROUGE = Rouge()
ROUGE.get_scores(summary,reference)

在执行上述语句时,我们得到以下输出。
[{'rouge-1': {'r': 0.410958904109589,
'p': 0.46153846153846156,
'f': 0.43478260371245536},
'rouge-2': {'r': 0.2808988764044944,
'p': 0.30864197530864196,
'f': 0.2941176420698962},
'rouge-l': {'r': 0.410958904109589,
'p': 0.46153846153846156,
'f': 0.43478260371245536}}]

因此,我们的ROUGE-1得分如下:
F1 - 0.43478260371245536
精度 - 0.46153846153846156
召回 - 0.410958904109589

我们的ROUGE-2得分如下:
F1 - 0.2941176420698962
精度 - 0.30864197530864196
召回 - 0.2808988764044944

我们的ROUGE-L得分如下:
F1 - 0.43478260371245536
精度 - 0.46153846153846156
召回 - 0.410958904109589

现在我们已经评估了所有自动生成的摘要,让我们来看看我们测试的模型与我们手写的参考摘要相比,彼此之间的表现如何

结论

  1. ***潜在语义分析(LSA)***在与我们的参考摘要进行评估时,对我们的源材料效果最好。
  2. 文本排名得分第二高。
  3. 库尔贝克-莱布勒之和模型的得分是第三高的。
  4. T5转化器该模型得分第二低。
  5. 卢恩的启发式方法得分最低。
评价指标启发式方法文本排序T5KL-总和卢恩方法
ROUGE-1 精度0.65270.46370.46150.47760.1829
ROUGE-1 召回率0.64380.43830.41090.43830.2054
ROUGE-1 F-分数0.64820.45070.43470.45710.1935
ROUGE-2精度0.47720.29670.30860.28730.0086
ROUGE-2 召回率0.47190.30330.28080.28080.0112
ROUGE-2 F-分数0.47450.29990.29410.28400.0097
ROUGE-L精度0.63880.44920.46150.46260.1585
ROUGE-L 召回率0.63010.42460.41090.42460.1780
ROUGE-L F-分数0.63440.43660.43470.44280.1677

TextRank模型、Kullback-Leibler Sum模型和T5 Transformer模型的得分都非常相似,在不同的输入文档上可能会有不同的表现。然而,Luhn的启发式方法的得分明显较差,这是因为它是最早出现的文本总结模型之一。

然而,就可读性而言,T5转化器模型产生了最好的摘要,因为它最接近于人类制造的摘要。虽然在抽象化总结领域仍有许多工作要做,但看到预训练的模型对我们的源材料总结得如此之好,令人印象深刻。如果我们的源材料有更多的句子,它很可能会产生一个比我们上面评估的其他抽取式模型更高的ROUGE分数的摘要。

谢谢你的阅读!