ES 高亮显示官网介绍https://www.elastic.co/guide/en/elasticsearch/ref

官网介绍www.elastic.co/guide/en/el…

highlight能够从搜索结果中的一个或多个字段中获取高亮显示的片段。当请求的DSL语句中显示请求highlight时，返回结果中包含高亮字段和包含字段的标签。
在提取要高亮显示的 terms 时，高亮显示不会反应查询的布尔逻辑。因此，对于一些复杂的布尔查询（例如 nested boolean 查询，用 minimun_should_match 查询等）,可能会高亮显示文档中与查询匹配不对应的部分。
高亮需要字段的实际content,如果这个字段没有存储（mapping 没有set store to true）则会加载实际的 _source 并从 _source 中提取相关字段。

使用如下：

GET /_search
{
  "query": {
    "match": { "content": "kimchy" }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

es 支持三种高亮显示： unified,plain,fvh(fast vector highlighter),可以为每个字段指定高亮类型。

unified

es默认就是这种高亮。Lucene unified 高亮将text拆分成 sentences 使用BM25算法对单个 sentence进行评分，就像他们是库中的文档一样。它也支持精确短语和 multe-term(fuzzy,prefix,regex)高亮

plain

这种方式试图通过理解terms查询中word重要性和任何单词定位标准来反应查询匹配逻辑。plain 高亮最适合在单个字段中的简单查询匹配，为了准确的反应查询逻辑，它创建了一个很小的内存索引并通过Lucene的查询执行计划重跑了原始查询条件，以访问当前文档的低级匹配信息。对于需要高亮显示的每个字段或文档都会重复次操作。如果想高亮显示具有复杂查询的大量文档中的许多字段，建议使用 unified 在 postings 或者 term_vector 字段。

fvh

这种高亮可以在mapping用 term_vector 设置 with_positions_offsets 设置高亮。

1. 可以定义成 bounday_scanner.
1. 需要设置 term_vector 带 with_positions_offsets ，这回增加索引的大小。
1. 可以将多个字段中的匹配项合并成一个结果。
1. 可以为不同位置的匹配分配不同的权重，以便在高亮显示增强查询时，将短语匹配排序为term匹配之上，而不是term匹配
1. 不支持span查询。

offset策略

为了从查询的词中创建有意义的搜索片段，高亮器需要知道在原始文本中每一个词的开始和结束偏移量。这些偏移量可以通过以下方式获得：

postings列表。如果index_options 在 mapping 中设置了offsets， unified高亮器将使用这些信息高亮显示文档，而无需重新分析文本。它直接在postings上重新运行原始查询，并从索引中提取匹配的偏移量，将集合限制为高亮显示的文档.如果有大字段，这一点很重要，因为它不需要重新分析要高亮显示的文本。与使用term_vectors相比，它还需要更少的磁盘空间。
term向量。如果通过在映射中将term_vector设置为with_positions_Offset来提供term_vector信息，则unified高亮器会自动使用term_vector高亮显示字段。它速度很快，尤其是对于大字段（>1MB）和突出显示前缀或通配符等muliti_term查询，因为它可以访问每个文档的term词典。fvh高亮器始终使用term_vector。
plain高亮。当没有其他替代方案时，unified将使用此模式。它创建一个小的内存索引，并通过Lucene的查询执行计划器重新运行原始查询条件，以访问当前文档的低级匹配信息。对于每个需要高亮显示的字段和文档，都会重复此操作。普通高亮器总是使用plain高亮显示。

高亮设置

高亮显示设置可以在全局级别设置，也可以在字段级别覆盖。

boundary_chars 包含每个边界字符的字符串，默认： .,!? \t\n.
boundary_max_scan 要扫描多长才能找到边界字符，默认 20
boundary_scanner 说明如何打断高亮的fragments:字符，句子或单词。仅使用于unified和fvh高亮。默认句子为unified高亮，chars 为fvh高亮。

chars 使用边界字符指定的字符作为高亮显示边界,boundary_max_scan 控制扫描边界字符的距离。仅适用于fvh高亮器。
sentence 根据Java的BreakIterator决定，在下一个句子边界处打断高亮显示的片段。可以用boundary_scanner_locale指定区域. note:与unified高亮器一起使用时，句子扫描器在第一个靠近 fragment_size 的单词边界分割大于fragment_size的句子，可以将fragment_size设置为0，以避免拆分任何句子。
word 根据Java的BreakIterator确定，在下一个单词边界处打断高亮显示的片段。可以用boundary_scanner_locale 指定区域。

boundary_scanner_locale 用于控制搜索句子和单词边界的区域设置，参数采用语言标记的形式，eg. "en-US", "fr-FR", "ja-JP".
encoder 显示代码段应为HTML编码：默认（无编码）还是HTML（HTML转义代码段文本，然后插入高亮标记）
fields 指定要高亮检索的字段。可以使用通配符指定字段，例如可以指定 comment_* 去获取以 comment_开头的所有 text 和 keyword 字段。使用通配符时，仅高亮显示text和keyword字段。如果使用自定义映射器并希望在字段上高亮显示，则必须明确指定该字段名。
force_source 高亮基于source，即使字段单独存储。默认为false。
fragmenter 指定高亮文本该如何分解：simple 或 span.仅适用plain高亮器，默认为span.

simple 将文本拆分为大小相同的片段
span 将文本拆分为大小相同的片段，但尽量避免在高亮显示terms之间拆分文本。这在查询短语时很有用

fragment_offset 控制开始高亮显示的边距。仅在使用fvh高亮时有效。
fragment_size 高亮显示的片段的大小（以字符为单位）。默认为100
highlight_query ES 不会验证 highlight_query 是否以任何方式包含search query，因此可以对其进行定义，从而不会高亮显示合法的查询结果。通常应该将search query 作为highlight_query的一部分。
matched_fields 将多个字段上的匹配项合并以高亮显示单个字段，对于以不同方式分析同一字符串的多字段来说，这是最直观的。所有matched_fields都必须将term_vector设置为with_positions_offset，但只加载匹配项组合到的字段，这样只有字段才能受益于将store设置为yes。仅适用于fvh。
no_match_size 如果没有要高亮显示的匹配片段，从开头字段返回text的总量。默认值为0。
number_of_fragments 要返回的最大片段数。如果片段数设置为0，则不会返回任何片段，相反返回高亮全部字段的内容。当需要高亮显示短文本（如标题或地址）时，这很方便，但不需要分段。如果碎片的数量为0，则忽略碎片大小。默认为5。
order 设置为“分数”时，按分数对高亮显示的片段进行排序。默认情况下，片段将按照它们在字段中出现的顺序（顺序：无）输出。将此选项设置为score将首先输出最相关的片段。每个highlighter应用自己的逻辑来计算相关性分数。
phrase_limit 控制文档中匹配短语的数量。防止fvh highlighter分析过多短语和消耗过多内存。使用matched_fields时，phrase_limit会考虑每个匹配字段。提高限制会增加查询时间并消耗更多内存。仅由fvh荧光灯支持。默认值为256。
pre_tags 连同 post_tags 定义 HTML高亮的文本。默认情况下，突出显示的文本被包装在<em>和</em>标记中。指定为字符串数组。
post_tags 与pre_tags 一同使用。
require_field_match 默认情况下，仅高亮显示包含查询匹配项的字段。将require_field_match设置为false以高亮显示所有字段。默认为true。
max_analyzed_offset 默认情况下，为高亮显示请求分析的最大字符数由索引中定义的值限定。高亮最大偏移量设置，当字符数超过此限制时，返回错误。如果将此设置设置为非负值，则高亮显示将在定义的最大限制处停止，其余文本不会被处理，因此不会高亮显示，也不会返回错误。最大偏移量查询设置不会覆盖索引。突出当设置为低于查询设置的值时，以最大偏移量为准。
tags_schema 设置为styled以使用内置标记模式。样式化模式定义了以下pre_tag，并将post_tag定义为。

<em class="hlt1">, <em class="hlt2">, <em class="hlt3">,
<em class="hlt4">, <em class="hlt5">, <em class="hlt6">,
<em class="hlt7">, <em class="hlt8">, <em class="hlt9">,
<em class="hlt10">

type 指定类型：unified，plain，fvh。默认 unified。

Highlighting示例

覆盖全局设置

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "number_of_fragments" : 3,
    "fragment_size" : 150,
    "fields" : {
      "body" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
      "blog.title" : { "number_of_fragments" : 0 },
      "blog.author" : { "number_of_fragments" : 0 },
      "blog.comment" : { "number_of_fragments" : 5, "order" : "score" }
    }
  }
}

指定高亮显示查询

GET /_search
{
  "query": {
    "match": {
      "comment": {
        "query": "foo bar"
      }
    }
  },
  "rescore": {
    "window_size": 50,
    "query": {
      "rescore_query": {
        "match_phrase": {
          "comment": {
            "query": "foo bar",
            "slop": 1
          }
        }
      },
      "rescore_query_weight": 10
    }
  },
  "_source": false,
  "highlight": {
    "order": "score",
    "fields": {
      "comment": {
        "fragment_size": 150,
        "number_of_fragments": 3,
        "highlight_query": {
          "bool": {
            "must": {
              "match": {
                "comment": {
                  "query": "foo bar"
                }
              }
            },
            "should": {
              "match_phrase": {
                "comment": {
                  "query": "foo bar",
                  "slop": 1,
                  "boost": 10.0
                }
              }
            },
            "minimum_should_match": 0
          }
        }
      }
    }
  }
}

设置高亮器类型

GET /_search
{
  "query": {
    "match": { "user.id": "kimchy" }
  },
  "highlight": {
    "fields": {
      "comment": { "type": "plain" }
    }
  }
}

设置高亮标签

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "pre_tags" : ["<tag1>"],
    "post_tags" : ["</tag1>"],
    "fields" : {
      "body" : {}
    }
  }
}

当用fvh 时，可以指定额外的tags 和 “importance”

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "pre_tags" : ["<tag1>", "<tag2>"],
    "post_tags" : ["</tag1>", "</tag2>"],
    "fields" : {
      "body" : {}
    }
  }
}

可以用内建 styled tag模式

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "tags_schema" : "styled",
    "fields" : {
      "comment" : {}
    }
  }
}

高亮在scoure

强制高亮显示基于source高亮显示字段，即使字段是单独存储的。默认为false。

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "fields" : {
      "comment" : {"force_source" : true}
    }
  }
}

高亮在全部字段

默认包含查询匹配的字段时高亮。设置 require_field_match = false 高亮显示全部字段。

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "require_field_match": false,
    "fields": {
      "body" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] }
    }
  }
}

多字段联合匹配

只支持fvh

GET /_search
{
  "query": {
    "query_string": {
      "query": "comment.plain:running scissors",
      "fields": [ "comment" ]
    }
  },
  "highlight": {
    "order": "score",
    "fields": {
      "comment": {
        "matched_fields": [ "comment", "comment.plain" ],
        "type": "fvh"
      }
    }
  }
}

显式排序高亮显示的字段

GET /_search
{
  "highlight": {
    "fields": [
      { "title": {} },
      { "text": {} }
    ]
  }
}

控制高亮的fragments

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "order" : "score",
    "fields" : {
      "comment" : {"fragment_size" : 150, "number_of_fragments" : 3}
    }
  }
}

GET /_search
{
  "query" : {
    "match": { "user.id": "kimchy" }
  },
  "highlight" : {
    "fields" : {
      "body" : {},
      "blog.title" : {"number_of_fragments" : 0}
    }
  }
}

GET /_search
{
  "query": {
    "match": { "user.id": "kimchy" }
  },
  "highlight": {
    "fields": {
      "comment": {
        "fragment_size": 150,
        "number_of_fragments": 3,
        "no_match_size": 150
      }
    }
  }
}

用posting列表高亮

PUT /example
{
  "mappings": {
    "properties": {
      "comment" : {
        "type": "text",
        "index_options" : "offsets"
      }
    }
  }
}

下面这个例子设置 comment 字段允许用 term_vectors 高亮，（这会使索引变大）

PUT /example
{
  "mappings": {
    "properties": {
      "comment" : {
        "type": "text",
        "term_vector" : "with_positions_offsets"
      }
    }
  }
}

对plain highlighter 设置fragmenter

GET my-index-000001/_search
{
  "query": {
    "match_phrase": { "message": "number 1" }
  },
  "highlight": {
    "fields": {
      "message": {
        "type": "plain",
        "fragment_size": 15,
        "number_of_fragments": 3,
        "fragmenter": "simple"
      }
    }
  }
}

{
  ...
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.6011951,
    "hits": [
      {
        "_index": "my-index-000001",
        "_id": "1",
        "_score": 1.6011951,
        "_source": {
          "message": "some message with the number 1",
          "context": "bar"
        },
        "highlight": {
          "message": [
            " with the <em>number</em>",
            " <em>1</em>"
          ]
        }
      }
    ]
  }
}

GET my-index-000001/_search
{
  "query": {
    "match_phrase": { "message": "number 1" }
  },
  "highlight": {
    "fields": {
      "message": {
        "type": "plain",
        "fragment_size": 15,
        "number_of_fragments": 3,
        "fragmenter": "span"
      }
    }
  }
}

{
  ...
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.6011951,
    "hits": [
      {
        "_index": "my-index-000001",
        "_id": "1",
        "_score": 1.6011951,
        "_source": {
          "message": "some message with the number 1",
          "context": "bar"
        },
        "highlight": {
          "message": [
            " with the <em>number</em> <em>1</em>"
          ]
        }
      }
    ]
  }
}