Presto - Coordinator 查询流程-4

774 阅读6分钟

1.总体介绍

上接:Presto - Coordinator 查询流程-3,上章节讲解了Ast重写、校验过程,在校验完毕之后会生成一个Analysis对象,本章将会讲解在analysis对象基础上构建逻辑计划

2.入口

Coordinator 查询流程-1中的DispatchManager#createQueryInternal,提交dispatchQuery;

resourceGroupManager.submit(dispatchQuery, selectionContext, dispatchExecutor);

继续查看ResourceGroupManager#submit

public void submit(ManagedQueryExecution queryExecution, SelectionContext<C> selectionContext, Executor executor)
{
    checkState(configurationManager.get() != null, "configurationManager not set");
    createGroupIfNecessary(selectionContext, executor);
    //从map中获取InternalResourceGroup,调用run
    groups.get(selectionContext.getResourceGroupId()).run(queryExecution);
}

继续查看InternalResourceGroup#run,当判断可以run时候调用startInBackground

public void run(ManagedQueryExecution query)
{
    synchronized (root) {
        ...
        // 不能跑也不能排队,则直接报错
        if (!canQueue && !canRun) {
            query.fail(new QueryQueueFullException(id));
            return;
        }
        // 能run
        if (canRun) {
            startInBackground(query);
        }
        // 否则进入排队
        else {
            enqueueQuery(query);
        }
        ...
    }
}

继续查看InternalResourceGroup#startInBackground

private void startInBackground(ManagedQueryExecution query)
{
    checkState(Thread.holdsLock(root), "Must hold lock to start a query");
    synchronized (root) {
        ...
        executor.execute(query::startWaitingForResources);
    }
}

给executor提交任务,执行LocalDispatchQuery#startWaitingForResources等待资源执行;

public void startWaitingForResources()
{
    if (stateMachine.transitionToWaitingForResources()) {
        waitForMinimumWorkers();
    }
}

继续执行LocalDispatchQuery#waitForMinimumWorkers,会等待足够workers,触发startExecution执行

private void waitForMinimumWorkers()
{
    // wait for query execution to finish construction
    addSuccessCallback(queryExecutionFuture, queryExecution -> {
        ListenableFuture<?> minimumWorkerFuture = clusterSizeMonitor.waitForMinimumWorkers(executionMinCount, getRequiredWorkersMaxWait(session));
        /**
         * when worker requirement is met, start the execution
         * 当达空闲的workers数量达到要求了,startExecution(queryExecution),使用开始执行剩下的查询
         */
        addSuccessCallback(minimumWorkerFuture, () -> startExecution(queryExecution));
        ...
    });
}

继续执行LocalDispatchQuery#startExecution,将使用QuerySubmitter提交Query查询任务;这个querySubmitter就是SqlQueryManager#createQuery

private void startExecution(QueryExecution queryExecution)
{
    queryExecutor.execute(() -> {
        if (stateMachine.transitionToDispatching()) {
            ... 其他代码
            querySubmitter.accept(queryExecution);
        }
    });
}

继续查看SqlQueryManager#createQuery

public void createQuery(QueryExecution queryExecution)
{
    ... 其他无关代码
    queryExecution.start();
}

最终找到查询执行的入口在QueryExecution#start

3. 查询的执行

3.1 QueryExecution执行的流程

如下代码所示,QueryExecution执行流程如下:

  1. 构建逻辑执行计划(包括构建和优化)
  2. 构建分布式执行计划
  3. 构建执行计划调度 SqlQueryScheduler 对象
  4. scheduler.start()开始调度
public void start()
{
    ... 无关代码
    // 构建逻辑执行计划
    PlanRoot plan = planQuery();
    // 构建分布式逻辑执行计划, DynamicFilterService needs plan for query to be registered.
    // Query should be registered before dynamic filter suppliers are requested in distribution planning.
    registerDynamicFilteringQuery(plan);
    // 执行计划分布式处理
    planDistribution(plan);

    if (!stateMachine.transitionToStarting()) {
        // query already started or finished
        return;
    }
    // if query is not finished, start the scheduler, otherwise cancel it
    SqlQueryScheduler scheduler = queryScheduler.get();
    if (!stateMachine.isDone()) {
        scheduler.start();
    }
}

3.2 计划查看构建逻辑执行计划

private PlanRoot planQuery()
{
    return doPlanQuery();
}
private PlanRoot doPlanQuery()
{
    // plan query
    LogicalPlanner logicalPlanner = new LogicalPlanner(stateMachine.getSession(),
            planOptimizers,
            idAllocator,
            metadata,
            typeOperators,
            new TypeAnalyzer(sqlParser, metadata),
            statsCalculator,
            costCalculator,
            stateMachine.getWarningCollector());
    Plan plan = logicalPlanner.plan(analysis);

    // fragment the plan
    SubPlan fragmentedPlan = planFragmenter.createSubPlans(stateMachine.getSession(), plan, false, stateMachine.getWarningCollector());

    // 提取Query级别的inputs和outputs
    List<Input> inputs = new InputExtractor(metadata, stateMachine.getSession()).extractInputs(fragmentedPlan);
    stateMachine.setInputs(inputs);
    stateMachine.setOutput(analysis.getTarget());

    boolean explainAnalyze = analysis.getStatement() instanceof Explain && ((Explain) analysis.getStatement()).isAnalyze();
    return new PlanRoot(fragmentedPlan, !explainAnalyze);
}

3.3 LogicalPlanner#plan

主要流程是:

  1. 构建逻辑计划PlanNode
  2. optimizer优化逻辑计划
  3. 创建Plan(root, types, statsAndCosts)
public Plan plan(Analysis analysis, Stage stage, boolean collectPlanStatistics)
{   // 构建初始的逻辑的计划
    PlanNode root = planStatement(analysis, analysis.getStatement());

    // 检验器校验逻辑计划,有很多的checker去check
    planSanityChecker.validateIntermediatePlan(root, session, metadata, typeOperators, typeAnalyzer, symbolAllocator.getTypes(), warningCollector);

    if (stage.ordinal() >= OPTIMIZED.ordinal()) {
        for (PlanOptimizer optimizer : planOptimizers) { // 优化
            root = optimizer.optimize(root, session, symbolAllocator.getTypes(), symbolAllocator, idAllocator, warningCollector);
            requireNonNull(root, format("%s returned a null plan", optimizer.getClass().getName()));
        }
    }

    if (stage.ordinal() >= OPTIMIZED_AND_VALIDATED.ordinal()) {
        // make sure we produce a valid plan after optimizations run. This is mainly to catch programming errors, 对优化后的逻辑执行计划进行验证, 保证优化过程中没有产生编程错误
        planSanityChecker.validateFinalPlan(root, session, metadata, typeOperators, typeAnalyzer, symbolAllocator.getTypes(), warningCollector);
    }

    TypeProvider types = symbolAllocator.getTypes();

    StatsAndCosts statsAndCosts = StatsAndCosts.empty();
    if (collectPlanStatistics) {
        StatsProvider statsProvider = new CachingStatsProvider(statsCalculator, session, types);
        CostProvider costProvider = new CachingCostProvider(costCalculator, statsProvider, Optional.empty(), session, types);
        statsAndCosts = StatsAndCosts.create(root, statsProvider, costProvider);
    }
    return new Plan(root, types, statsAndCosts); // 搜集查询过程中的数据类型和cost(explain)
}

##3.4 LogicalPlanner#planStatement

public PlanNode planStatement(Analysis analysis, Statement statement)
{
    // create as select,不关心
    if ((statement instanceof CreateTableAsSelect && analysis.getCreate().get().isCreateTableAsSelectNoOp()) ||
            statement instanceof RefreshMaterializedView && analysis.isSkipMaterializedViewRefresh()) {
        .... 
    }
    return createOutputPlan(planStatementWithoutOutput(analysis, statement), analysis);
}

继续查看createOutputPlan和planStatementWithoutOutput 发现流程是这样的:

  • 1 通过planStatementWithoutOutput生成逻辑计划root:PlanNode
    • 1.1 使用RelationPlanner.process真实的构造RelationPlan(RelationPlan其实PlanNode的扩展,主要用途在RelationPlanner处理的时候增加Plan其他参数;
  • 2 在root的基础上封装OutputNode(outputFields,outputSymbol)
// 在执行计划的最上层封装一个固定的OutputNode
private PlanNode createOutputPlan(RelationPlan plan, Analysis analysis)
{
    ImmutableList.Builder<Symbol> outputs = ImmutableList.builder();
    ImmutableList.Builder<String> names = ImmutableList.builder();
    int columnNumber = 0;
    RelationType outputDescriptor = analysis.getOutputDescriptor();
    for (Field field : outputDescriptor.getVisibleFields()) {
        String name = field.getName().orElse("_col" + columnNumber);
        names.add(name);
        int fieldIndex = outputDescriptor.indexOf(field);
        Symbol symbol = plan.getSymbol(fieldIndex);
        outputs.add(symbol);
        columnNumber++;
    }
    // 构建OutputNode,根据root:PlanNode
    return new OutputNode(idAllocator.getNextId(), plan.getRoot(), names.build(), outputs.build());
}
private RelationPlan planStatementWithoutOutput(Analysis analysis, Statement statement)
{
    ... 其他情况
    if (statement instanceof Query) {
        return createRelationPlan(analysis, (Query) statement);
    }
}
private RelationPlan createRelationPlan(Analysis analysis, Query query)
{
    return new RelationPlanner(analysis, symbolAllocator, idAllocator, buildLambdaDeclarationToSymbolMap(analysis, symbolAllocator), metadata, Optional.empty(), session, ImmutableMap.of())
            .process(query, null);
}

3.5 RelationPlanner、QueryPlanner、SubQueryPlanner

Presto使用RelationPlanner、QueryPlanner、SubQueryPlanner共同构造逻辑计划;主要讨论是通过递归遍历Query:Statment;遇到对应的节点**,调用对应的visit** 方法;

RelationPlanner并不负责所有AST节点的转换,而是只负责Relation相关节点的转换, 包括Table、AliasedRelation、SampleRelation、Join、TableSubQuery、Query、QuerySpecification、 Values、Unnest、Union、Intersect、Except。 对于类似Query、QuerySpecification等节点,具有一个整体封装的关系型查询子节点, RelationPlanner会委托给QueryPlanner进行处理;

QueryPlanner负责Query相关的部分构建;比如QuerySpecification,Query;

SubQueryPlanner专门用户处理子SubQuery的部分;处理的方式就是构造新的RelationPlanner进行处理

3.5.1 RelationPlanner#visitTable

从analysis中根据Table获取namedQuery,若namedQuery不为空,先递归处理namedQuery(类型为Query)构建subPlan(RealtionPlan),再处理type后作为RelationPlan返回,若namedQuery为空,直接构建TableScanNode节点,转为RelationPlan返回

protected RelationPlan visitTable(Table node, Void context)
{
    Query namedQuery = analysis.getNamedQuery(node);
    Scope scope = analysis.getScope(node);

    RelationPlan plan;
    if (namedQuery != null) {
        RelationPlan subPlan;
        if (analysis.isExpandableQuery(namedQuery)) {
            subPlan = new QueryPlanner(analysis, symbolAllocator, idAllocator, lambdaDeclarationToSymbolMap, metadata, outerContext, session, recursiveSubqueries)
                    .planExpand(namedQuery);
        }
        else {
            subPlan = process(namedQuery, null);
        }

        // Add implicit coercions if view query produces types that don't match the declared output types
        // of the view (e.g., if the underlying tables referenced by the view changed)

        List<Type> types = analysis.getOutputDescriptor(node)
                .getAllFields().stream()
                .map(Field::getType)
                .collect(toImmutableList());

        NodeAndMappings coerced = coerce(subPlan, types, symbolAllocator, idAllocator);

        plan = new RelationPlan(coerced.getNode(), scope, coerced.getFields(), outerContext);
    }
    else {
        TableHandle handle = analysis.getTableHandle(node);

        ImmutableList.Builder<Symbol> outputSymbolsBuilder = ImmutableList.builder();
        ImmutableMap.Builder<Symbol, ColumnHandle> columns = ImmutableMap.builder();
        for (Field field : scope.getRelationType().getAllFields()) {
            Symbol symbol = symbolAllocator.newSymbol(field);

            outputSymbolsBuilder.add(symbol);
            columns.put(symbol, analysis.getColumn(field));
        }

        List<Symbol> outputSymbols = outputSymbolsBuilder.build();
        boolean isDeleteTarget = analysis.isDeleteTarget(node);
        PlanNode root = TableScanNode.newInstance(idAllocator.getNextId(), handle, outputSymbols, columns.build(), isDeleteTarget);

        plan = new RelationPlan(root, scope, outputSymbols, outerContext);
    }

    plan = addRowFilters(node, plan);
    plan = addColumnMasks(node, plan);

    return plan;
}

3.5.2 RelationPlanner#visitJoin

先递归处理join的left节点,获取一个RelationPlan,判断join的right节点,是否是一个Unnest节点,Unnest节点单独处理,是否是一个Lateral节点,Lateral节点同样单独处理(这里不做深究)若right节点不是Unnest和Lateral节点,那么递归处理right节点,获取一个RelationPlan,经过joinType的判断、输出列元信息的处理之后,构建一个JoinNode,并封装为RealtionPlan返回

protected RelationPlan visitJoin(Join node, Void context)
{
... 代码省略,太多了,有兴趣可以自己看
}

3.5.2 RelationPlanner#visitQuerySpecificication

构建查询的主体,委托给QueryPlanner执行

protected RelationPlan visitQuerySpecification(QuerySpecification node, Void context)
{
    return new QueryPlanner(analysis, symbolAllocator, idAllocator, lambdaDeclarationToSymbolMap, metadata, outerContext, session, recursiveSubqueries)
            .plan(node);
}

继续查看QueryPlanner#plan,可以发现这里构建查询各个组成部分,filter,aggregate, window, project, orderby, distinct, sort, offset, limit

public RelationPlan plan(QuerySpecification node)
{
    PlanBuilder builder = planFrom(node);

    builder = filter(builder, analysis.getWhere(node), node);
    builder = aggregate(builder, node);
    builder = filter(builder, analysis.getHaving(node), node);
    builder = window(node, builder, ImmutableList.copyOf(analysis.getWindowFunctions(node)));

    List<SelectExpression> selectExpressions = analysis.getSelectExpressions(node);
    List<Expression> expressions = selectExpressions.stream()
            .map(SelectExpression::getExpression)
            .collect(toImmutableList());
    builder = subqueryPlanner.handleSubqueries(builder, expressions, node);

    if (hasExpressionsToUnfold(selectExpressions)) {
        // pre-project the folded expressions to preserve any non-deterministic semantics of functions that might be referenced
        builder = builder.appendProjections(expressions, symbolAllocator, idAllocator);
    }

    List<Expression> outputs = outputExpressions(selectExpressions);
    if (node.getOrderBy().isPresent()) {
        // ORDER BY requires outputs of SELECT to be visible.
        // For queries with aggregation, it also requires grouping keys and translated aggregations.
        if (analysis.isAggregation(node)) {
            // Add projections for aggregations required by ORDER BY. After this step, grouping keys and translated
            // aggregations are visible.
            List<Expression> orderByAggregates = analysis.getOrderByAggregates(node.getOrderBy().get());
            builder = builder.appendProjections(orderByAggregates, symbolAllocator, idAllocator);
        }

        // Add projections for the outputs of SELECT, but stack them on top of the ones from the FROM clause so both are visible
        // when resolving the ORDER BY clause.
        builder = builder.appendProjections(outputs, symbolAllocator, idAllocator);

        // The new scope is the composite of the fields from the FROM and SELECT clause (local nested scopes). Fields from the bottom of
        // the scope stack need to be placed first to match the expected layout for nested scopes.
        List<Symbol> newFields = new ArrayList<>();
        newFields.addAll(builder.getTranslations().getFieldSymbols());

        outputs.stream()
                .map(builder::translate)
                .forEach(newFields::add);

        builder = builder.withScope(analysis.getScope(node.getOrderBy().get()), newFields);

        builder = window(node, builder, ImmutableList.copyOf(analysis.getOrderByWindowFunctions(node.getOrderBy().get())));
    }

    List<Expression> orderBy = analysis.getOrderByExpressions(node);
    builder = subqueryPlanner.handleSubqueries(builder, orderBy, node);
    builder = builder.appendProjections(Iterables.concat(orderBy, outputs), symbolAllocator, idAllocator);

    builder = distinct(builder, node, outputs);
    Optional<OrderingScheme> orderingScheme = orderingScheme(builder, node.getOrderBy(), analysis.getOrderByExpressions(node));
    builder = sort(builder, orderingScheme);
    builder = offset(builder, node.getOffset());
    builder = limit(builder, node.getLimit(), orderingScheme);
    builder = builder.appendProjections(outputs, symbolAllocator, idAllocator);

    return new RelationPlan(
            builder.getRoot(),
            analysis.getScope(node),
            computeOutputs(builder, outputs),
            outerContext);
}

至此,逻辑计划便构建完毕,之后进入逻辑计划的优化;