Elasticsearch Analysis Token Filters

290 阅读1分钟

常用Elasticsearch Analysis Token Filters

1、Length Token Filter
length用于去掉过长或者过短的单词。min 定义最短长度,max 定义最长长度
$ curl -XGET 'http://localhost:9200/xinxin/_analyze' -d '
{
  "analyzer": "share_analyzer", #自定义的分析器,token过滤器(filter)使用类型type=length
  "text" : "this is a test"
}'
#响应
{
    "tokens": [
        {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "a",
            "start_offset": 8,
            "end_offset": 9,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]
}

2、Lowercase Token Filter
将词元文本规范化为小写

3、Uppercase Token Filter
将词元文本规范化为大写;

4、Shingle Token Filter
single类型的词元过滤器用于创建词元的组合作为单个词元
$ curl -XGET 'http://localhost:9200/xinxin/_analyze' -d '
{
  "analyzer": "share_analyzer", #自定义的分析器,token过滤器(filter)使用类型type=shingle
  "text" : "this is a test"
}'
#响应
{
    "tokens": [
        {
            "token": "this is",
            "start_offset": 0,
            "end_offset": 7,
            "type": "shingle",
            "position": 0
        },
        {
            "token": "is a",
            "start_offset": 5,
            "end_offset": 9,
            "type": "shingle",
            "position": 1
        },
        {
            "token": "a test",
            "start_offset": 8,
            "end_offset": 14,
            "type": "shingle",
            "position": 2
        }
    ]
}

5、Stop Token Filter
stop 类型的词元过滤器用于将stowords所列的单词从token stream中移除

6、Synonym Token Filter
用于在分析期间处理同义词

7、Reverse Token Filter
reverse词元过滤器将词元进行简单的翻转

8、Truncate Token Filter
truncate词元过滤器的作用是减少词元到特定长度,就是需要给定一个词元长度length, 如果单个词元长度超过length,超过length的部分会被截断

9、Trim Token Filter
trim词元过滤器的作用是去除词元周围的空格