上一篇：Elasticsearch8.5.3 源码分析（1） - 掘金 (juejin.cn)

下一篇：Elasticsearch8.5.3源码分析(3)-Get数据读取过程 - 掘金 (juejin.cn)

流程综述

上篇分析到Rest API请求如何转发到NodeClient，本篇继续分析后续处理流程。这里以GET /{index}/_doc/{id} API为例来分析，其对应的RestHandler处理类为RestGetAction。

客户端Get请求发出后，接收到请求的节点，直接充当协调节点的角色，将请求分发给对应的分片所在的节点去处理，然后再汇总结果返回给客户端。主要调用流程如下图所示：

RestGetAction将Get请求转发到NodeClient
NodeClient通过节点初始化时注册的映射关系，找到对应的TranportAction.
TranportAction收集索分片信息，然后通过TransportService分发请求至相关的节点。

集群内部通信请求的转发最终由各种TransportAction完成。这些Action位于org.elasticsearch.action包下

RestHandler映射TransportAction

不同的RestHandler转发的请求由对应的TransportAction来处理。

如：GET /{index}/_doc/{id}-> RestGetAction ->TransportGetAction

RestHandler与TransportAction的关联关系，使用中间类型ActionType来转换。RestHandler在调用NodeClient.get方法时，会传递对应的ActionType类型：

public void get(final GetRequest request, final ActionListener<GetResponse> listener) {
    execute(GetAction.INSTANCE, request, listener);
}

NodeClient.transportAction方法负责转换，由ActionType得到负责内部通信处理的TransportAction实现。

ActionType的实现如下图：

ActionType和TransportAction的映射关系在节点启动时通过ActionModule.setupActions方法进行注册。

TransportAction 处理流程分析

TransportGetAction.execute方法,继承至父类TransportAction，其最终调用的是TransportSingleShardAction.doExecute方法。创建一个AsyncSingleAction实例，然后执行它的start方法。AsyncSingleAction是TransportSingleShardAction的内部类。

protected void doExecute(Task task, Request request, ActionListener<Response> listener) {
    new AsyncSingleAction(request, listener).start();
}

TransportGetAction、TransportSingleShardAction和TransportAction类层次关系图如下：

TransportGetAction.execute的核心处理流程如下图所示：

通过路由规则计算获取主分片和副本分片信息，组装成分片迭代器。
调用TrsansportService分发数据请求

下面的内容重点分析如何通过路由规则计算得到分片信息。

TransportSingleShardAction.AsyncSingleAction构造方法：

private AsyncSingleAction(Request request, ActionListener<Response> listener) {
    ...
    // 拿到本次需要处理的分片
    this.shardIt = shards(clusterState, internalRequest);
}

AsyncSingleAction构造方法中最核心的逻辑就是获取索引分片迭代器-ShardsIterator。迭代器包含了该索引所有可用分片。

获取分片信息

获取文档数据，首先需要知道索引的分片分别位于哪些节点之上，然后找到最合适的节点分发数据请求。获取分片的代码在OperationRouting.getShards中：

public ShardIterator getShards(ClusterState clusterState,String index,String id,
    @Nullable String routing,@Nullable String preference) {
    //1.获取索引路由类别，分自定义路由，分区索引路由和普通索引路由
    IndexRouting indexRouting = IndexRouting.fromIndexMetadata(indexMetadata(clusterState, index));
    //2.根据IndexRouting的路由算法，获取分片表,包括主分片和所有副本分片
    IndexShardRoutingTable shards = clusterState.getRoutingTable().shardRoutingTable(index, indexRouting.getShard(id, routing));
    //3.选择有效分片组装为迭代器
    return preferenceActiveShardIterator(shards, clusterState.nodes().getLocalNodeId(), clusterState.nodes(), preference, null, null);
}

获取IndexRouting

IndexRouting为索引路由基础抽像类，其子类实现了不同的分片路由算法。IndexRouting的类层次结构如下：

ExtractFromSource:从source中获取路由键，适用于TSDS类索引(index_mode: time_series)。
IdAndRoutingOnly:使用doc的_id属性和_routing值做为路由键。IdAndRoutingOnly的两个具体实现类Unpartitioned(普通非分区索引)和Partitioned(分区索引)

获取IndexRouting的代码在IndexRouting.fromIndexMetadata方法中。

public static IndexRouting fromIndexMetadata(IndexMetadata metadata) {
    if (false == metadata.getRoutingPaths().isEmpty()) {
        return new ExtractFromSource(metadata);
    }
    if (metadata.isRoutingPartitionedIndex()) {
        return new Partitioned(metadata);
    }
    return new Unpartitioned(metadata);
}

分片路由算法

分片路由计算模板方法在IndexRouting类中，IndexRouting类的基础算法如下：

protected final int hashToShardId(int hash) {
    return Math.floorMod(hash, routingNumShards) / routingFactor;
}

不同的子类计算hash的方式不同，但最终通过hash计算分片的逻辑一致。

routingNumShards：虚拟槽位(slot)总数量，它决定索引可以拆分的最大分片数。默认值为最大值：1024
routingFactor：每个分片占用槽位数量。即槽位总数量 / 分片数

IndexMetadata构造方法中定义了计算逻辑，见下图：

将代码翻译一下，解析成下面的公式：

routing_factor = num_routing_shards / num_primary_shards

单个分片槽位数 = 总槽位数 / 主分片数

shard_num = (hash(_routing) % num_routing_shards) / routing_factor

分片号 = （路由键哈希值 % 总槽位数） / 单个分片槽位数

看看下面这个例子理解消化一下这个公式：

numPrimaryShards = 4 4个主分片
routingNumShards = 1024 槽位总数量。
routingFactor = routingNumShards / numPrimaryShards = 256 每个分片分配槽位数量

在早期的版本(7.13以及之前的版本)中分片算法计算逻辑是下面这样的：

shard = hash(_routing) % number_of_primary_shards

那么为什么要做这样的改变呢？

主要是为了在扩容和缩容时，数据能按槽位进行迁移，这样能大大提高迁移效率,同时以槽位来进行合并和拆分分片的设计也更容易让人理解。如：原本有8个分片，现在要合并成4，那么原来的0分片要和4分片合并，1分片要和5分片合并，以此类推。而如果是哈希槽的话，就是将连续的槽位合并就可以了。0分片和1分片合并，2和3合并，以此类推。

Elasticsearch索引分片扩容，仅允许按2的n次幂拆分(翻倍扩容)，最多拆成1024个分片。当然如果你最初设定的主分片是5个，那么你只能拆成 10、20、40、80、160、320 或最多 640 个分片（640 * 2 就超出1024了）。

参考官方说明：索引模块索引拆分

普通未分区索引-Unpartitioned

普通索引IndexRouting.Unpartitioned的路由分片算法如下：

int hash = Murmur3HashFunction.hash(effectiveRouting);
return Math.floorMod(hash, routingNumShards) / routingFactor;

如果用户传递了_routing参数，就用_routing参数的值来计算hash.如果没有就用文档的_id属性的值来计算hash。

分区索引-Partitioned

分区索引IndexRouting.Partitioned的路由分片算法如下：

int offset = Math.floorMod(effectiveRoutingToHash(id), routingPartitionSize);
int hash = effectiveRoutingToHash(routing) + offset;
return Math.floorMod(hash, routingNumShards) / routingFactor;

分区索引_routing参数用来计算hash值，是必须参数。_id用来计算offset,实际上就是通过在原有hash取模的基础上，再加上一个偏移量，使得相同_routing值的计算结果也不会单一落在一个分片上。

分区索引可以一定程度上避免数据倾斜，让数据在各个分片上分布更均匀。与此同时带来的害处就是搜索时需要更多的查询请求来汇聚数据。为什么这么说呢? 一起来看看下面这个示例：

PUT user
{
  "settings": {
    "index": {
      "number_of_shards": 4,
      "number_of_replicas": 0,
      "number_of_routing_shards": 4
    }
  }
}

numPrimaryShards = 4 4个主分片
routingNumShards = 4 槽位数量。4个槽位最多4个主分片
routingFactor = routingNumShards / numPrimaryShards = 1 每个分片分配槽位数量

此时路由规则如下图所示：

假设插入如下四条数据：

POST /user/_bulk?refresh
{"index":{"_id":1,"routing": "A"}}
{"name": "001"}
{"index":{"_id":2,"routing": "A"}}
{"name": "002"}
{"index":{"_id":3,"routing": "A"}}
{"name": "003"}
{"index":{"_id":4,"routing": "A"}}
{"name": "004"}

此时这四条数据必然落在同一个分片下，因为他们的路由键值相同，都是A。

接下来我们将索引分区大小调整为2个槽位.

routing_partition_size = 2 分区大小为2个solt槽位，当前分片大小为一个槽位，所以如此设置一个分区可以跨2个分片。

删除索引重新创建：

PUT user
{
  "settings": {
    "index": {
      "number_of_shards": 4,
      "number_of_replicas": 0,
      "number_of_routing_shards": 4,
      "routing_partition_size": 2
    }
  },
  "mappings": {
    "_routing": {
      "required": true
    }
  }
}

重新插入那四条数据，按照分区索引的路由算法，一个分区包含两个分片，原来落在同一个分片上的数据，有可能散落在两个连续的分片上，因为算法在原来hash值基础上增加了offset，而offset=hash(_id) % 2，要么为0，要么为1. 0落在原分片，1落在下一个分片。

查询一下数据分片信息：

GET /user/_search
{
  "explain": true,
  "query": {
    "match_all": {}
  }
}

可以看到，数据分布到了两个分片上：

这个算法也不能保证完全均匀分布，就像此例中hash(1),hash(2),hash(4)的值一样，所以都落在了同一个分片上。另外由于相同路由键的文档落在了不同分片上，所以在查询文档时，也会导致某些情况下明明指定了路由键也需要从多个分片(可能分属不同主机)上获取数据的问题。

同时也要注意每个分片占用的槽位数量和分区占用的槽位数量关系。如果分片的槽位数量与分区的槽位数量一样，有可能出现某个路由键，永远只能分到同一个分片上的问题。例如：

numPrimaryShards = 4 4个主分片

routingNumShards = 8 槽位总数量。

routingFactor = routingNumShards / numPrimaryShards = 2 每个分片分配槽位数量

routingPartitionSize = 2

在以上配置下，槽位分布如下：

如果某个路由键值 hash(_routing) % 8 = 0。那么因为分区offset也只有2个槽位大小。所以此时无论_id如何变化，都会落在shard 0分片上。变化的只是虚拟槽位不一样而已。

TSDS索引-ExtractFromSource

TSDS索引IndexRouting.ExtractFromSource路由分片算法如下：

int hash = hashSource(sourceType,source).buildHash(IndexRouting.ExtractFromSource::defaultOnEmpty);
return Math.floorMod(hash, routingNumShards) / routingFactor;

算法和Unpartitioned类似，不过路由键是从_source里的字段提取。

Elasticsearch8.5.3 源码分析（2）-TransportAction映射和分片路由模型