Elasticsearch8.5.3源码分析(3)-Get数据读取过程

505 阅读6分钟

上一篇:Elasticsearch8.5.3 源码分析(2)-TransportAction映射和分片路由模型 - 掘金 (juejin.cn)

综述

仍然是以GET /{index}/_doc/{id} 接口为基础来分析Elasticsearch交互流程。上一篇分析到请求最后转交给TransportService,本篇分析TransportService.sendRequest方法及后续处理流程。

image.png

如上图所示TransportService.sendRequest方法,首先通过getConnection方法获取Transport.Connection。代码如下:

public Transport.Connection getConnection(DiscoveryNode node) {
    if (isLocalNode(node)) {
        return localNodeConnection;
    } else {
        return connectionManager.getConnection(node);
    }
}

判断目标通信节点是否为本地节点,如果为本地节点返回localNodeConnection。localNodeConnection是Transport.Connection的匿名类实例。

如果通信目标是远程节点,则返回的是TcpTransport.NodeChannels实例。

localNodeConnection的实现:

private final Transport.Connection localNodeConnection = new Transport.Connection() {
    ......
    @Override
    public void sendRequest(long requestId, String action, TransportRequest request, TransportRequestOptions options) {
        sendLocalRequest(requestId, action, request, options);
    }
    ......
}

可以看到,其sendRequest方法的实现为直接调用sendLocalRequest方法。

Get请求读数据流程:

  • 客户端发Get请求至某个ES节点
  • 接收请求的ES节点充当协调节点角色,根据分片路由规则判断数据在哪个分片。
  • 如果数据分片在当前节点,则通过分片数据读取模块获取数据。
  • 如果数据分片在其它节点,通过Netty远程通信模块,将请求转发至分片所在节点,再通过分片数据读取模块获取数据返回给协调节点。
  • 协调节点拿到数据后,返回响应给客户端

image.png

本地节点请求处理过程

下面分析TransportService.sendLocalRequest方法实现流程。

image.png

TransportService.sendLocalRequest方法源码:

private void sendLocalRequest(long requestId, final String action, final TransportRequest request, TransportRequestOptions options) {
    //构建ResponseChannel后续用来发送Response信息给客户端
    final DirectResponseChannel channel = new DirectResponseChannel(localNode, action, requestId, this, threadPool);
    try {
        onRequestSent(localNode, requestId, action, request, options);
        onRequestReceived(requestId, action);
        @SuppressWarnings("unchecked")
        //获取当前action对应的RequestHandlerRegistry处理类
        final RequestHandlerRegistry<TransportRequest> reg = (RequestHandlerRegistry<TransportRequest>) getRequestHandler(action);
        if (reg == null) {
            assert false : action;
            throw new ActionNotFoundTransportException("Action [" + action + "] not found");
        }
        final String executor = reg.getExecutor();
        if (ThreadPool.Names.SAME.equals(executor)) {
            //使用当前线程处理
            try (var ignored = threadPool.getThreadContext().newTraceContext()) {
                try {
                    //处理请求
                    reg.processMessageReceived(request, channel);
                } catch (Exception e) {
                    handleSendToLocalException(channel, e, action);
                }
            }
        } else {
            //创建新的线程处理
            ...
            //处理请求
            reg.processMessageReceived(request, channel);
            ...
        }
        ...

TransportGetAction.asyncShardOperation

protected void asyncShardOperation(GetRequest request, ShardId shardId, ActionListener<GetResponse> listener) throws IOException {
    IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex());
    //获取数据所在目标分片
    IndexShard indexShard = indexService.getShard(shardId.id());
    if (request.realtime()) { 
        //实时执行,即数据未刷盘也可从 translog 读取
        super.asyncShardOperation(request, shardId, listener);
    } else {
        //等待数据刷入内核缓冲区后读取(默认刷新间隔为1s)
        indexShard.awaitShardSearchActive(b -> {
            try {
                super.asyncShardOperation(request, shardId, listener);
            } catch (Exception ex) {
                listener.onFailure(ex);
            }
        });
    }
}

TransportSingleShardAction.asyncShardOperation方法使用线程池异步执行,先调用TransportGetAction.shardOperation获取GetResponse,然后调用listener.onResponse方法发送客户端响应。

protected void asyncShardOperation(Request request, ShardId shardId, ActionListener<Response> listener) throws IOException {
    threadPool.executor(getExecutor(request, shardId)).execute(ActionRunnable.supply(listener, () -> shardOperation(request, shardId)));
}

TransportGetAction.shardOperation方法调用ShardGetService.get获取文档数据GetResult,并包装成GetResponse返回。

protected GetResponse shardOperation(GetRequest request, ShardId shardId) throws IOException {
    IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex());
    IndexShard indexShard = indexService.getShard(shardId.id());
    //request.refresh()表示是否在读取前执行刷新操作
    if (request.refresh() && request.realtime() == false) {
        indexShard.refresh("refresh_flag_get");
    }
    //获取分片数据
    GetResult result = indexShard.getService()
        .get(
            request.id(),
            request.storedFields(),
            request.realtime(),
            request.version(),
            request.versionType(),
            request.fetchSourceContext(),
            request.isForceSyntheticSource()
        );
    return new GetResponse(result);
}

远程节点请求处理过程

如果请求的数据不在当前节点上,则会将数据请求转发到对应的远程目标节点。与远程节点进行通信的方式仍然是通过netty组件。集群节点之间的通信Netty配置初始化类为Netty4Transport.java,其绑定的业务处理Handler为Netty4MessageInboundHandler.java

image.png

与远程节点通信的核心代码在OutboundHandler中,sendMessage方法序列化通信消息.internalSend方法调用Netty4TcpChannel处理通信事宜。

image.png

远程节点监听接收到消息后,核心处理逻辑在InboundPipeline和InboundHandler类中,后续处理流程则和本地节点一致。

获取文档数据过程

前面分析到本地节点和远程节点最终都是通过ShardGetService.get方法拿到数据。下面分析ShardGetService.get方法处理过程。

image.png

InternalEngine.get方法。 参考IndexShard.getEngine方法,一个索引分片对应一个Engine实例,Engine中的方法均为分片级别的操作。

public GetResult get(
    Get get,
    MappingLookup mappingLookup,
    DocumentParser documentParser,
    Function<Engine.Searcher, Engine.Searcher> searcherWrapper
) {
    assert Objects.equals(get.uid().field(), IdFieldMapper.NAME) : get.uid().field();
    //开启分片读锁。当分片在执行Flush操作时,此处将被阻塞。
    //因为Get操作,有可能需要从Translog中读取最新数据。而Flush操作会将内核中Segment刷写到磁盘,同时截断Translog。
    try (ReleasableLock ignored = readLock.acquire()) {
        ensureOpen();
        //默认情况为实时读取数据
        if (get.realtime()) {
            final VersionValue versionValue;
            //开启文档版本锁。
            try (Releasable ignore = versionMap.acquireLock(get.uid().bytes())) {
                // 锁定版本以实时获取更新过的最新版本
                versionValue = getVersionFromMap(get.uid().bytes());
            }
            if (versionValue != null) {
                if (versionValue.isDelete()) {
                    //最新版本数据被删除
                    return GetResult.NOT_EXISTS;
                }
                //并发场景下可能存在版本冲突
                //根据当前版本类型检查当前版本是否与预期版本冲突
                if (get.versionType().isVersionConflictForReads(versionValue.version, get.version())) {
                    throw new VersionConflictEngineException(
                        shardId,
                        "[" + get.id() + "]",
                        get.versionType().explainConflictForReads(versionValue.version, get.version())
                    );
                }
                if (get.getIfSeqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO
                    && (get.getIfSeqNo() != versionValue.seqNo || get.getIfPrimaryTerm() != versionValue.term)) {
                    throw new VersionConflictEngineException(
                        shardId,
                        get.id(),
                        get.getIfSeqNo(),
                        get.getIfPrimaryTerm(),
                        versionValue.seqNo,
                        versionValue.term
                    );
                }
                //默认从Translog中读取最新数据。isReadFromTranslog和readtime属性值一致。
                if (get.isReadFromTranslog()) {
                    //文档有更新操作时,首先被写入Translog.一秒后才会执行refresh操作,刷至内核缓冲区。所以这1秒内文档数据是没有更新,处于不一致状态。只能先从Translog中读取最新数据。
                    if (versionValue.getLocation() != null) {
                        try {
                            final Translog.Operation operation = translog.readOperation(versionValue.getLocation());
                            if (operation != null) {
                                //返回Translog中的最新数据
                                return getFromTranslog(get, (Translog.Index) operation, mappingLookup, documentParser, searcherWrapper);
                            }
                        } catch (IOException e) {
                            maybeFailEngine("realtime_get", e); // lets check if the translog has failed with a tragic event
                            throw new EngineException(shardId, "failed to read operation from translog", e);
                        }
                    } else {
                        trackTranslogLocation.set(true);
                    }
                }
                assert versionValue.seqNo >= 0 : versionValue;
                //不从Translog中读取,则调用refresh,将数据刷新至内核,从而对读操作可见。
                refreshIfNeeded("realtime_get", versionValue.seqNo);
            }
            //Searcher.Reader读取分片数据
            return getFromSearcher(get, acquireSearcher("realtime_get", SearcherScope.INTERNAL, searcherWrapper), false);
        } else {
            // we expose what has been externally expose in a point in time snapshot via an explicit refresh
            return getFromSearcher(get, acquireSearcher("get", SearcherScope.EXTERNAL, searcherWrapper), false);
        }
    }
}

校验并发版本冲突时,源码中作了两处校验:

  1. version
  2. ifSeqNo和ifPrimaryTerm
  • version针对单个文档,文档的每次修改(包括删除)操作版本号都会自增 version=version+1。
  • ifSeqNo针对索引单个分片,对分片内文档的每次修改(包括删除)操作ifSeqNo都会累加,ifSeqNo=ifSeqNo+1。
  • ifPrimaryTerm在每次主分片切换时累加,如下线重启,故障切换重新选举新的主分片等。

InternalEngine.getFromSearcher

protected final GetResult getFromSearcher(Get get, Engine.Searcher searcher, boolean uncachedLookup) throws EngineException {
    final DocIdAndVersion docIdAndVersion;
    try {
        if (uncachedLookup) {
            //不缓存PerThreadIDVersionAndSeqNoLookup
            docIdAndVersion = VersionsAndSeqNoResolver.loadDocIdAndVersionUncached(searcher.getIndexReader(), get.uid(), true);
        } else {
            docIdAndVersion = VersionsAndSeqNoResolver.loadDocIdAndVersion(searcher.getIndexReader(), get.uid(), true);
        }
    } catch (Exception e) {
        Releasables.closeWhileHandlingException(searcher);
        // TODO: A better exception goes here
        throw new EngineException(shardId, "Couldn't resolve version", e);
    }
    //并发版本冲突检测
    if (docIdAndVersion != null) {
        if (get.versionType().isVersionConflictForReads(docIdAndVersion.version, get.version())) {
            Releasables.close(searcher);
            throw new VersionConflictEngineException(
                shardId,
                "[" + get.id() + "]",
                get.versionType().explainConflictForReads(docIdAndVersion.version, get.version())
            );
        }
        if (get.getIfSeqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO
            && (get.getIfSeqNo() != docIdAndVersion.seqNo || get.getIfPrimaryTerm() != docIdAndVersion.primaryTerm)) {
            Releasables.close(searcher);
            throw new VersionConflictEngineException(
                shardId,
                get.id(),
                get.getIfSeqNo(),
                get.getIfPrimaryTerm(),
                docIdAndVersion.seqNo,
                docIdAndVersion.primaryTerm
            );
        }
    }

    if (docIdAndVersion != null) {
        // don't release the searcher on this path, it is the
        // responsibility of the caller to call GetResult.release
        return new GetResult(searcher, docIdAndVersion);
    } else {
        Releasables.close(searcher);
        return GetResult.NOT_EXISTS;
    }
}

VersionsAndSeqNoResolver.loadDocIdAndVersion方法。

通过LeafReader获取文档的version、seqNo和primaryTerm,然后构建DocIdAndVersion实例返回。

public static DocIdAndVersion loadDocIdAndVersion(IndexReader reader, Term term, boolean loadSeqNo) throws IOException {
    //每个LeafReadContext对应一个PerThreadIDVersionAndSeqNoLookup,使用TreadLocal缓存PerThreadIDVersionAndSeqNoLookup数组,而不是每次调用(如mget操作)都重新创建PerThreadIDVersionAndSeqNoLookup。
    PerThreadIDVersionAndSeqNoLookup[] lookups = getLookupState(reader, term.field());
    List<LeafReaderContext> leaves = reader.leaves();
    // 从后向前迭代,最近的文档更新操作通常发生在最后一个段上
    for (int i = leaves.size() - 1; i >= 0; i--) {
        final LeafReaderContext leaf = leaves.get(i);
        PerThreadIDVersionAndSeqNoLookup lookup = lookups[leaf.ord];
        //通过LeafReader获取文档的version、seqNo和primaryTerm。然后构建DocIdAndVersion
        DocIdAndVersion result = lookup.lookupVersion(term.bytes(), loadSeqNo, leaf);
        if (result != null) {
            return result;
        }
    }
    return null;
}

IndexReader和IndexReaderContext

image.png

IndexReader是一个抽象类,提供用于访问索引数据的接口。在打开新IndexReader之前,对索引所做的任何更改都是不可见的。

IndexReader通常通过调用DirectoryReader.open(Directory)方法来构造实例.

有两种类型的IndexReader:

  • LeafReader:叶子读取器。LeafReader是一个抽象类,提供用于访问索引的接口。索引的搜索完全通过这个抽象接口完成,因此实现它的任何子类都是可搜索的。支持检索存储(stored)字段、文档值、词条(terms)和词条倒排索引记录(postings)。
  • CompositeReader:复合读取器。CompositeReader也是抽像类,它由多个子读取器LeafReader组成,CompositeReader本身是不具备直接检索功能,只能从LeafReader子读取器进行检索。CompositeReader通过LeafReaderContext.leaves()获取所有子读取器。

注意: IndexReader 实例是完全线程安全的,这意味着多个线程可以同时调用其任何方法。