1,项目动机: 提取文本的关键词方法,
pip install summa -i pypi.douban.com/simple
pypi.org/project/sum…
官方示例:
Text summarization: 文本摘要
>>> text = """Automatic summarization is the process of reducing a text document with a \
computer program in order to create a summary that retains the most important points \
of the original document. As the problem of information overload has grown, and as \
the quantity of data has increased, so has interest in automatic summarization. \
Technologies that can make a coherent summary take into account variables such as \
length, writing style and syntax. An example of the use of summarization technology \
is search engines such as Google. Document summarization is another."""
>>> from summa import summarizer
>>> print(summarizer.summarize(text))
'Automatic summarization is the process of reducing a text document with a computer
program in order to create a summary that retains the most important points of the
original document.'
关键词生成: Keyword extraction:
>>> from summa import keywords
>>> print(keywords.keywords(text))
document
summarization
writing
account
- 更多示例
from summa.summarizer import summarize
# 将摘要的长度定义为文本的一部分(也可用于关键字)
summarize(text, ratio=0.2)
# 通过近似的单词数定义摘要的长度(也可用于关键字)
summarize(text, words=50)
# 定义输入文本语言(也可用于关键字)
summarize(text, language='spanish')
# 以列表形式获取结果(也可用于关键字)
summarize(text, split=True)
使用关键词:
from summa.keywords import keywords
keywords(text, split=True)
keywords(test_text, split=True,ratio=0.3)
总结:对于短的文本 获取到还是单个的单词较多,不是我理性中的关键词分类。