常用Elasticsearch Analysis Token Filters
1、Length Token Filter
length用于去掉过长或者过短的单词。min 定义最短长度,max 定义最长长度
$ curl -XGET 'http://localhost:9200/xinxin/_analyze' -d '
{
"analyzer": "share_analyzer", #自定义的分析器,token过滤器(filter)使用类型type=length
"text" : "this is a test"
}'
#响应
{
"tokens": [
{
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "a",
"start_offset": 8,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 2
}
]
}
2、Lowercase Token Filter
将词元文本规范化为小写
3、Uppercase Token Filter
将词元文本规范化为大写;
4、Shingle Token Filter
single类型的词元过滤器用于创建词元的组合作为单个词元
$ curl -XGET 'http://localhost:9200/xinxin/_analyze' -d '
{
"analyzer": "share_analyzer", #自定义的分析器,token过滤器(filter)使用类型type=shingle
"text" : "this is a test"
}'
#响应
{
"tokens": [
{
"token": "this is",
"start_offset": 0,
"end_offset": 7,
"type": "shingle",
"position": 0
},
{
"token": "is a",
"start_offset": 5,
"end_offset": 9,
"type": "shingle",
"position": 1
},
{
"token": "a test",
"start_offset": 8,
"end_offset": 14,
"type": "shingle",
"position": 2
}
]
}
5、Stop Token Filter
stop 类型的词元过滤器用于将stowords所列的单词从token stream中移除
6、Synonym Token Filter
用于在分析期间处理同义词
7、Reverse Token Filter
reverse词元过滤器将词元进行简单的翻转
8、Truncate Token Filter
truncate词元过滤器的作用是减少词元到特定长度,就是需要给定一个词元长度length, 如果单个词元长度超过length,超过length的部分会被截断
9、Trim Token Filter
trim词元过滤器的作用是去除词元周围的空格