#openGauss #入门 #安装 #数据库 #开源
知识来源:docs-opengauss.osinfra.cn/zh/
解析器
文本搜索解析器负责将原文档文本分解为多个token,并标识每个token的类型。这里的类型集由解析器本身定义。注意,解析器并不修改文本,它只是确定合理的单词边界。由于这一限制,人们更需要定制词典,而不是为每个应用程序定制解析器。
目前openGauss提供了三个内置的解析器,分别为pg_catalog.default/pg_catalog.ngram/pg_catalog.pound,其中pg_catalog.default适用于英文分词场景,pg_catalog.ngram/pg_catalog.pound是为了支持中文全文检索功能新增的两种解析器,适用于中文及中英混合分词场景。
内置解析器pg_catalog.default,它能识别23种token类型,显示在表1中。
表 1 默认解析器类型
别名
描述
示例
asciiword
Word, all ASCII letters
elephant
word
Word, all letters
mañana
numword
Word, letters and digits
beta1
asciihword
Hyphenated word, all ASCII
up-to-date
hword
Hyphenated word, all letters
lógico-matemática
numhword
Hyphenated word, letters and digits
openGauss-beta1
hword_asciipart
Hyphenated word part, all ASCII
openGauss in the context openGauss-beta1
hword_part
Hyphenated word part, all letters
lógico or matemática in the context lógico-matemática
hword_numpart
Hyphenated word part, letters and digits
beta1 in the context openGauss-beta1
Email address
protocol
Protocol head
http://
url
URL
example.com/stuff/index.html
host
Host
example.com
url_path
URL path
/stuff/index.html, in the context of a URL
file
File or path name
/usr/local/foo.txt, if not within a URL
sfloat
Scientific notation
-1.23E+56
float
Decimal notation
-1.234
int
Signed integer
-1234
uint
Unsigned integer
1234
version
Version number
8.3.0
tag
XML tag
entity
XML entity
&
blank
Space symbols
(any whitespace or punctuation not otherwise recognized)
#openGauss #入门 #安装 #数据库 #开源