Presto - Coordinator 查询流程-3

714 阅读7分钟

1.总体介绍

上接:Presto - Coordinator 查询流程-2,上章节讲解了Ast的构建过程,查询语义部分校验;本章重点介绍Analyzer对语法树Ast语义校验过程

2.源码解析

2.1 DispatchQueryFactory#createDispatchQuery

Presto - Coordinator 查询流程-1中介绍调用了DispatchQueryFactory#createDispatchQuery,这便是查询解析的入口;其中重点方法是托管给QueryExecutionFactory创建QueryExection;

public DispatchQuery createDispatchQuery(
            Session session,
            String query,
            PreparedQuery preparedQuery,
            Slug slug,
            ResourceGroupId resourceGroup)
        {
        // 1.首先创建Query状态机,当状态改变的时候会触发对应的listener调用
        QueryStateMachine stateMachine = QueryStateMachine.begin(
                query,
                preparedQuery.getPrepareSql(),
                session,
                locationFactory.createQueryLocation(session.getQueryId()),
                resourceGroup,
                isTransactionControlStatement(preparedQuery.getStatement()),
                transactionManager,
                accessControl,
                executor,
                metadata,
                warningCollector,
                StatementUtils.getQueryType(preparedQuery.getStatement().getClass()));
        // 提交任务,创建QueryExecution
        // QueryExecution是核心对象,其中实现了逻辑计划构建、解析,优化,分布式、提交给Worker等操作;
        // 但是QueryExecution的构造和执行分别在不同的线程中执行;
        ListenableFuture<QueryExecution> queryExecutionFuture = executor.submit(() -> {
            QueryExecutionFactory<?> queryExecutionFactory = executionFactories.get(preparedQuery.getStatement().getClass());
            // 托管给QueryExecutionFactory创建QueryExection
            return queryExecutionFactory.createQueryExecution(preparedQuery, stateMachine, slug, warningCollector);
        });
        // 封装LocalDispacthQuery 返回
        return new LocalDispatchQuery(
                stateMachine,
                queryExecutionFuture,
                queryMonitor,
                clusterSizeMonitor,
                executor,
                queryManager::createQuery);
    }

2.1 SqlQueryExecutionFactory#createQueryExecution

继续查看构建QueryExecution的过程,发现创建SqlQueryExecution对象;其中SqlQueryExecution的构造方法是private,无法通过外部构造;SqlQueryExecution封装了一个查询执行绝大部分重要逻辑,是我们关注的重点;

public QueryExecution createQueryExecution(
                PreparedQuery preparedQuery,
                QueryStateMachine stateMachine,
                Slug slug,
                WarningCollector warningCollector)
        {   ... 
            return new SqlQueryExecution(
                    preparedQuery,
                    stateMachine,
                    slug,
                    metadata,
                    ... 其他参数);
        }
// SqlQueryExecution的构造方法
private SqlQueryExecution(
            PreparedQuery preparedQuery,
            QueryStateMachine stateMachine,
            Slug slug,
            Metadata metadata,
            ...)
    {
        ...
        // 开始语义分析Query;analyze query
        this.analysis = analyze(preparedQuery, stateMachine, metadata, groupProvider, accessControl, sqlParser, queryExplainer, warningCollector);
        // when the query finishes cache the final query info, and clear the reference to the output stage
        AtomicReference<SqlQueryScheduler> queryScheduler = this.queryScheduler;
        stateMachine.addStateChangeListener(state -> {
            if (!state.isDone()) {
                return;
            }
            // query is now done, so abort any work that is still running
            SqlQueryScheduler scheduler = queryScheduler.get();
            if (scheduler != null) {
                scheduler.abort();
            }
        });
        // RemoteTaskFactory,用于向Worker发送Task;分布式stage切分之后,生成任务;
        this.remoteTaskFactory = new MemoryTrackingRemoteTaskFactory(requireNonNull(remoteTaskFactory, "remoteTaskFactory is null"), stateMachine);
}

2.3 SqlQueryExecution#analyze

分析托管给Analyzer.analyze方法;对statement Ast进行校验;其中

private Analysis analyze(
            PreparedQuery preparedQuery,
            QueryStateMachine stateMachine,
            Metadata metadata,
            GroupProvider groupProvider,
            AccessControl accessControl,
            SqlParser sqlParser,
            QueryExplainer queryExplainer,
            WarningCollector warningCollector)
    {
        Analyzer analyzer = new Analyzer(
                stateMachine.getSession(),
                metadata,
                sqlParser,
                groupProvider,
                accessControl,
                Optional.of(queryExplainer),
                preparedQuery.getParameters(),
                parameterExtractor(preparedQuery.getStatement(), preparedQuery.getParameters()),
                warningCollector,
                statsCalculator);

        Analysis analysis = analyzer.analyze(preparedQuery.getStatement());
        return analysis;
    }

2.4 Analyzer#analyze

Analyzer使用vistor模式访问statment树,在遍历树的同时,搜集信息到对象Analysis;想要看懂规则,必须先要了解在Presto中使用的Visitor模式;下面将会介绍Presto的Vistor模式的使用,然后在选取部分规则分析;

public Analysis analyze(Statement statement)
{
    return analyze(statement, false);
}
public Analysis analyze(Statement statement, boolean isDescribe)
{
    // 查询重写, 通过5个固定的查询重写规则,对几类具体查询语句进行改写:EXPLAIN,DESCRIBE INPUT/OUTPUT,SHOW QUERIES/STATS
    Statement rewrittenStatement = StatementRewrite.rewrite(session, metadata, sqlParser, queryExplainer, statement, parameters, parameterLookup, groupProvider, accessControl, warningCollector, statsCalculator);
    Analysis analysis = new Analysis(rewrittenStatement, parameterLookup, isDescribe);
    StatementAnalyzer analyzer = new StatementAnalyzer(analysis, metadata, sqlParser, groupProvider, accessControl, session, warningCollector, CorrelationSupport.ALLOWED);
    // 语法校验
    analyzer.analyze(rewrittenStatement, Optional.empty());
    // check column access permissions for each table
    // 访问权限:检查是否有每列的权限
    analysis.getTableColumnReferences().forEach((accessControlInfo, tableColumnReferences) ->
            tableColumnReferences.forEach((tableName, columns) ->
                    accessControlInfo.getAccessControl().checkCanSelectFromColumns(
                            accessControlInfo.getSecurityContext(session.getRequiredTransactionId(), session.getQueryId()),
                            tableName,
                            columns)));
    return analysis;
}

2.4.1 AstVisitor

在AstVisitor中,默认实现了访问每一种Node的方法,其含义在遍历Ast树的过程中,若是当前节点是Node节点,则会调用visitNode访问当前节点,若当前节点是Relation,则会调用visitRelation方法;

public abstract class AstVisitor<R, C>
{
    public R process(Node node, @Nullable C context)
    {
        return node.accept(this, context);
    }
    protected R visitNode(Node node, C context)
    {
        return null;
    }
    protected R visitExpression(Expression node, C context)
    {
        return visitNode(node, context);
    }
    protected R visitRelation(Relation node, C context)
    {
        return visitNode(node, context);
    }
    ... 其他方法

visitRelation方法是在哪里被调用的呢?答案是在Relation的apply方法中,也就是每一种Node都实现了访问自己的visit方法,代码在执行的时候利用多态决定调用访问自己的方法;

public abstract class Relation
        extends Node
{
    protected Relation(Optional<NodeLocation> location)
    {
        super(location);
    }

    @Override
    public <R, C> R accept(AstVisitor<R, C> visitor, C context)
    {
        return visitor.visitRelation(this, context);
    }
}

当需要访问Ast树时,只需要继承AstVistor,重写其中visit**方法,便可以实现访问具体某种节点的逻辑;比如下面的规则会在访问Explain时,生成Explain对象,其中process(node.getStatement(), null)是递归的处理子节点,某些节点如果不需要处理子节点,也可以不调用;

class Visitor extends AstVisitor<Node, Void> {
    @Override
    protected Node visitExplain(Explain node, Void context)
    {
        Statement statement = (Statement) process(node.getStatement(), null);
        return new Explain(
                node.getLocation().get(),
                node.isAnalyze(),
                node.isVerbose(),
                statement,
                node.getOptions());
    }
}

2.4.2 重写规则

重写规则包含5个,分别是:

  • DescribeInputRewrite
  • DescribeOutputRewrite
  • ShowQueriesRewrite
  • ShowStatsRewrite
  • ExplainRewrite 其中ShowQueriesRewrite主要是将show语句改写成查询对应的系统表meta信息;兼容show 语法;其他的类似;ShowStatsRewrite 重写 "SHOW STATS FOR table"语句的查询计划;其他的类似;

2.4.3 校验规则(StatementAnalyzer)

查看StatementAnalyzer.Visitor的类,方法很多,红框之中的visit**方法代表着访问对应节点时的操作;如果感兴趣可以查询阅,这里对几个经常用到的重点方法做下介绍; image.png

2.4.3.1 visitQuerySpecification

protected Scope visitQuerySpecification(QuerySpecification node, Optional<Scope> scope)
{
    /**
     * (visitTable) 校验查询表的catalog、schema、table是否存在,并将元数据中获取table对应的column,
     * 输出到中间结果analysis中(analysis是干啥用的?),并在analysis中注册表
     */
    Scope sourceScope = analyzeFrom(node, scope);

    analyzeWindowDefinitions(node, sourceScope);
    resolveFunctionCallWindows(node);
    /**
     * analyzeWhere:校验where子句的合法性(不可在where子句中包含窗口聚合函数或是Grouping函数),
     * 对于where子句中的谓词进行表达式解析analyzeExpression,校验谓词的结果必须是boolean
     */
    node.getWhere().ifPresent(where -> analyzeWhere(node, sourceScope, where));
    /**
     * 在处理select子句时,我们会将select中的item获取出来进行遍历,同时analyzeSelect方法的返回结果是最终输出的一组表达式List<Expression>。
     * 在对selectItem进行遍历时,区分了AllColumns和SingleColumn分别进行处理,其中AllColumns对应 select * 的模式,
     * SingleColumn则是具体的查询列。
     */
    List<Expression> outputExpressions = analyzeSelect(node, sourceScope);
    /**
     * analyzeGroupBy:group by必要的语法校验,例如,
     * 普通group by列必须在select列表(output列表)里,
     * 同时处理特殊的group by子句,如cube函数、rollup、groupingsets
     */
    GroupingSetAnalysis groupByAnalysis = analyzeGroupBy(node, sourceScope, outputExpressions);
    /**
     * analyzeHaving:having子句不可包含窗口函数,同时结果必须是一个boolean
     */
    analyzeHaving(node, sourceScope);
    Scope outputScope = computeAndAssignOutputScope(node, scope, sourceScope);

    List<Expression> orderByExpressions = emptyList();
    Optional<Scope> orderByScope = Optional.empty();
    if (node.getOrderBy().isPresent()) {
        OrderBy orderBy = node.getOrderBy().get();
        orderByScope = Optional.of(computeAndAssignOrderByScope(orderBy, sourceScope, outputScope));
        /**
         * analyzeOrderBy:必要语法解析,若orderBy列为序数,校验序数的合法性,否则将排序的表达式使用analyzeExpression进行解析。
         */
        orderByExpressions = analyzeOrderBy(node, orderBy.getSortItems(), orderByScope.get());

        if (sourceScope.getOuterQueryParent().isPresent() && node.getLimit().isEmpty() && node.getOffset().isEmpty()) {
            // not the root scope and ORDER BY is ineffective
            analysis.markRedundantOrderBy(orderBy);
            warningCollector.add(new TrinoWarning(REDUNDANT_ORDER_BY, "ORDER BY in subquery may have no effect"));
        }
    }
    analysis.setOrderByExpressions(node, orderByExpressions);

    if (node.getOffset().isPresent()) {
        /**
         * analyzeOffset:校验offset参数的合法性
         */
        analyzeOffset(node.getOffset().get(), outputScope);
    }

    if (node.getLimit().isPresent()) {
        /**
         * analyzeLimit:校验limit子句的合法性,包括fetchfirst和普通limit
         */
        boolean requiresOrderBy = analyzeLimit(node.getLimit().get(), outputScope);
        if (requiresOrderBy && node.getOrderBy().isEmpty()) {
            throw semanticException(MISSING_ORDER_BY, node.getLimit().get(), "FETCH FIRST WITH TIES clause requires ORDER BY");
        }
    }
    List<Expression> sourceExpressions = new ArrayList<>();
    analysis.getSelectExpressions(node).stream()
            .map(SelectExpression::getExpression)
            .forEach(sourceExpressions::add);
    node.getHaving().ifPresent(sourceExpressions::add);
    for (WindowDefinition windowDefinition : node.getWindows()) {
        WindowSpecification window = windowDefinition.getWindow();
        sourceExpressions.addAll(window.getPartitionBy());
        getSortItemsFromOrderBy(window.getOrderBy()).stream()
                .map(SortItem::getSortKey)
                .forEach(sourceExpressions::add);
        window.getFrame()
                .map(WindowFrame::getStart)
                .flatMap(FrameBound::getValue)
                .ifPresent(sourceExpressions::add);
        window.getFrame()
                .flatMap(WindowFrame::getEnd)
                .flatMap(FrameBound::getValue)
                .ifPresent(sourceExpressions::add);
    }

    /**
     * analyzeGroupingOperations:从oderby和output中抽取出来GroupingOperation的表达式???
     */
    analyzeGroupingOperations(node, sourceExpressions, orderByExpressions);
    /**
     * analyzeAggregations:结合output和orderby对聚合进行分析???
     */
    analyzeAggregations(node, sourceScope, orderByScope, groupByAnalysis, sourceExpressions, orderByExpressions);

    /**
     * analyzeWindowFunctions:窗口函数的合法性
     */
    analyzeWindowFunctions(node, outputExpressions, orderByExpressions);

    if (analysis.isAggregation(node) && node.getOrderBy().isPresent()) {
        ImmutableList.Builder<Expression> aggregates = ImmutableList.<Expression>builder()
                .addAll(groupByAnalysis.getOriginalExpressions())
                .addAll(extractAggregateFunctions(orderByExpressions, metadata))
                .addAll(extractExpressions(orderByExpressions, GroupingOperation.class));

        analysis.setOrderByAggregates(node.getOrderBy().get(), aggregates.build());
    }

    if (node.getOrderBy().isPresent() && node.getSelect().isDistinct()) {
        verifySelectDistinct(node, orderByExpressions, outputExpressions, sourceScope, orderByScope.get());
    }

    return outputScope;
}

2.4.3.2 visitTable

/**
 * 在对From子句进行解析时,如果是单一查询,则对From节点中的Table节点进行访问visitTable,visitTable过程中会和metadata模块进行交互,
 * 获取当前查询表的所有元信息,并最终构建当前QuerySpecification的一个Scope;
 *  1. 解析table查询名的合法性,包括Catalog、Schema、TableName
 *  2. 视图解析,若为视图则构建视图的Scope(?为什么要区分视图和表?)
 *  3. 查询TableHandle确认表的存在性
 *  4. 获取Table元数据信息,构建Table中的列信息,并注册column和table到analysis中
 *  5. 创建Scope
 */
protected Scope visitTable(Table table, Optional<Scope> scope)
{
    ... //代码太多,此处省略;如果感兴趣,参考https://github.com/trinodb/trino/tree/352
    return tableScope;
}

2.4.3.3 analyzeExpression(分析表达式)

分析遍历Ast的节点大致可以分为两种,一种是上层节点,比如Select, With, Limit, OrderBy,GroupBy等表示查询语句某一部分的节点,还有一种是描述前者聚合参数信息的节点表达式,表达式有很多比如 "Select " 中的 "" 会解析成 AllRow对象;

/**
 * 在对表达式expression进行校验analyzeExpression的过程中,首先我们需要了解,Expression也对应到sql语法中,
 * 是在AST构建过程中对于原始语法节点的遍历过程中构建出来的。
 * 首先,SQL语法中会定义一个identifier规则,AstBuilder在对原始语法树进行遍历时会选择性地去构建Identifier表达式。
 */
private ExpressionAnalysis analyzeExpression(Expression expression, Scope scope)
{
    return ExpressionAnalyzer.analyzeExpression(
            session,
            metadata,
            groupProvider,
            accessControl,
            sqlParser,
            scope,
            analysis,
            expression,
            warningCollector,
            correlationSupport);
}

2.4.4 表达式的解析

根据2.4.3小节,表达式的解析托管给 ExpressionAnalyzer.analyzeExpression方法;与StatementAnalyzer类似,ExpressionAnalyzer也使用了vistor模式遍历express树,然后校验表达式的正确性;其中 ExpressionAnalyzer.Visitor#visitIdentifier便是校验当前Identifier是否存在于Scope;(举个简单的例子,Select a,b from table,需要校验table中确实存在这两列,如果不存在则当前Scope中不会有a,b)

protected Type visitIdentifier(Identifier node, StackableAstVisitorContext<Context> context)
{
    // 使用Scope对当前操作的Identifier进行校验,是否存在于当前的Scope
    ResolvedField resolvedField = context.getContext().getScope().resolveField(node, QualifiedName.of(node.getValue()));
    return handleResolvedField(node, resolvedField, context);
}