1.总体介绍
上接:Presto - Coordinator 查询流程-2,上章节讲解了Ast的构建过程,查询语义部分校验;本章重点介绍Analyzer对语法树Ast语义校验过程;
2.源码解析
2.1 DispatchQueryFactory#createDispatchQuery
Presto - Coordinator 查询流程-1中介绍调用了DispatchQueryFactory#createDispatchQuery,这便是查询解析的入口;其中重点方法是托管给QueryExecutionFactory创建QueryExection;
public DispatchQuery createDispatchQuery(
Session session,
String query,
PreparedQuery preparedQuery,
Slug slug,
ResourceGroupId resourceGroup)
{
// 1.首先创建Query状态机,当状态改变的时候会触发对应的listener调用
QueryStateMachine stateMachine = QueryStateMachine.begin(
query,
preparedQuery.getPrepareSql(),
session,
locationFactory.createQueryLocation(session.getQueryId()),
resourceGroup,
isTransactionControlStatement(preparedQuery.getStatement()),
transactionManager,
accessControl,
executor,
metadata,
warningCollector,
StatementUtils.getQueryType(preparedQuery.getStatement().getClass()));
// 提交任务,创建QueryExecution
// QueryExecution是核心对象,其中实现了逻辑计划构建、解析,优化,分布式、提交给Worker等操作;
// 但是QueryExecution的构造和执行分别在不同的线程中执行;
ListenableFuture<QueryExecution> queryExecutionFuture = executor.submit(() -> {
QueryExecutionFactory<?> queryExecutionFactory = executionFactories.get(preparedQuery.getStatement().getClass());
// 托管给QueryExecutionFactory创建QueryExection
return queryExecutionFactory.createQueryExecution(preparedQuery, stateMachine, slug, warningCollector);
});
// 封装LocalDispacthQuery 返回
return new LocalDispatchQuery(
stateMachine,
queryExecutionFuture,
queryMonitor,
clusterSizeMonitor,
executor,
queryManager::createQuery);
}
2.1 SqlQueryExecutionFactory#createQueryExecution
继续查看构建QueryExecution的过程,发现创建SqlQueryExecution对象;其中SqlQueryExecution的构造方法是private,无法通过外部构造;SqlQueryExecution封装了一个查询执行绝大部分重要逻辑,是我们关注的重点;
public QueryExecution createQueryExecution(
PreparedQuery preparedQuery,
QueryStateMachine stateMachine,
Slug slug,
WarningCollector warningCollector)
{ ...
return new SqlQueryExecution(
preparedQuery,
stateMachine,
slug,
metadata,
... 其他参数);
}
// SqlQueryExecution的构造方法
private SqlQueryExecution(
PreparedQuery preparedQuery,
QueryStateMachine stateMachine,
Slug slug,
Metadata metadata,
...)
{
...
// 开始语义分析Query;analyze query
this.analysis = analyze(preparedQuery, stateMachine, metadata, groupProvider, accessControl, sqlParser, queryExplainer, warningCollector);
// when the query finishes cache the final query info, and clear the reference to the output stage
AtomicReference<SqlQueryScheduler> queryScheduler = this.queryScheduler;
stateMachine.addStateChangeListener(state -> {
if (!state.isDone()) {
return;
}
// query is now done, so abort any work that is still running
SqlQueryScheduler scheduler = queryScheduler.get();
if (scheduler != null) {
scheduler.abort();
}
});
// RemoteTaskFactory,用于向Worker发送Task;分布式stage切分之后,生成任务;
this.remoteTaskFactory = new MemoryTrackingRemoteTaskFactory(requireNonNull(remoteTaskFactory, "remoteTaskFactory is null"), stateMachine);
}
2.3 SqlQueryExecution#analyze
分析托管给Analyzer.analyze方法;对statement Ast进行校验;其中
private Analysis analyze(
PreparedQuery preparedQuery,
QueryStateMachine stateMachine,
Metadata metadata,
GroupProvider groupProvider,
AccessControl accessControl,
SqlParser sqlParser,
QueryExplainer queryExplainer,
WarningCollector warningCollector)
{
Analyzer analyzer = new Analyzer(
stateMachine.getSession(),
metadata,
sqlParser,
groupProvider,
accessControl,
Optional.of(queryExplainer),
preparedQuery.getParameters(),
parameterExtractor(preparedQuery.getStatement(), preparedQuery.getParameters()),
warningCollector,
statsCalculator);
Analysis analysis = analyzer.analyze(preparedQuery.getStatement());
return analysis;
}
2.4 Analyzer#analyze
Analyzer使用vistor模式访问statment树,在遍历树的同时,搜集信息到对象Analysis;想要看懂规则,必须先要了解在Presto中使用的Visitor模式;下面将会介绍Presto的Vistor模式的使用,然后在选取部分规则分析;
public Analysis analyze(Statement statement)
{
return analyze(statement, false);
}
public Analysis analyze(Statement statement, boolean isDescribe)
{
// 查询重写, 通过5个固定的查询重写规则,对几类具体查询语句进行改写:EXPLAIN,DESCRIBE INPUT/OUTPUT,SHOW QUERIES/STATS
Statement rewrittenStatement = StatementRewrite.rewrite(session, metadata, sqlParser, queryExplainer, statement, parameters, parameterLookup, groupProvider, accessControl, warningCollector, statsCalculator);
Analysis analysis = new Analysis(rewrittenStatement, parameterLookup, isDescribe);
StatementAnalyzer analyzer = new StatementAnalyzer(analysis, metadata, sqlParser, groupProvider, accessControl, session, warningCollector, CorrelationSupport.ALLOWED);
// 语法校验
analyzer.analyze(rewrittenStatement, Optional.empty());
// check column access permissions for each table
// 访问权限:检查是否有每列的权限
analysis.getTableColumnReferences().forEach((accessControlInfo, tableColumnReferences) ->
tableColumnReferences.forEach((tableName, columns) ->
accessControlInfo.getAccessControl().checkCanSelectFromColumns(
accessControlInfo.getSecurityContext(session.getRequiredTransactionId(), session.getQueryId()),
tableName,
columns)));
return analysis;
}
2.4.1 AstVisitor
在AstVisitor中,默认实现了访问每一种Node的方法,其含义在遍历Ast树的过程中,若是当前节点是Node节点,则会调用visitNode访问当前节点,若当前节点是Relation,则会调用visitRelation方法;
public abstract class AstVisitor<R, C>
{
public R process(Node node, @Nullable C context)
{
return node.accept(this, context);
}
protected R visitNode(Node node, C context)
{
return null;
}
protected R visitExpression(Expression node, C context)
{
return visitNode(node, context);
}
protected R visitRelation(Relation node, C context)
{
return visitNode(node, context);
}
... 其他方法
那visitRelation方法是在哪里被调用的呢?答案是在Relation的apply方法中,也就是每一种Node都实现了访问自己的visit方法,代码在执行的时候利用多态决定调用访问自己的方法;
public abstract class Relation
extends Node
{
protected Relation(Optional<NodeLocation> location)
{
super(location);
}
@Override
public <R, C> R accept(AstVisitor<R, C> visitor, C context)
{
return visitor.visitRelation(this, context);
}
}
当需要访问Ast树时,只需要继承AstVistor,重写其中visit**方法,便可以实现访问具体某种节点的逻辑;比如下面的规则会在访问Explain时,生成Explain对象,其中process(node.getStatement(), null)是递归的处理子节点,某些节点如果不需要处理子节点,也可以不调用;
class Visitor extends AstVisitor<Node, Void> {
@Override
protected Node visitExplain(Explain node, Void context)
{
Statement statement = (Statement) process(node.getStatement(), null);
return new Explain(
node.getLocation().get(),
node.isAnalyze(),
node.isVerbose(),
statement,
node.getOptions());
}
}
2.4.2 重写规则
重写规则包含5个,分别是:
- DescribeInputRewrite
- DescribeOutputRewrite
- ShowQueriesRewrite
- ShowStatsRewrite
- ExplainRewrite 其中ShowQueriesRewrite主要是将show语句改写成查询对应的系统表meta信息;兼容show 语法;其他的类似;ShowStatsRewrite 重写 "SHOW STATS FOR table"语句的查询计划;其他的类似;
2.4.3 校验规则(StatementAnalyzer)
查看StatementAnalyzer.Visitor的类,方法很多,红框之中的visit**方法代表着访问对应节点时的操作;如果感兴趣可以查询阅,这里对几个经常用到的重点方法做下介绍;
2.4.3.1 visitQuerySpecification
protected Scope visitQuerySpecification(QuerySpecification node, Optional<Scope> scope)
{
/**
* (visitTable) 校验查询表的catalog、schema、table是否存在,并将元数据中获取table对应的column,
* 输出到中间结果analysis中(analysis是干啥用的?),并在analysis中注册表
*/
Scope sourceScope = analyzeFrom(node, scope);
analyzeWindowDefinitions(node, sourceScope);
resolveFunctionCallWindows(node);
/**
* analyzeWhere:校验where子句的合法性(不可在where子句中包含窗口聚合函数或是Grouping函数),
* 对于where子句中的谓词进行表达式解析analyzeExpression,校验谓词的结果必须是boolean
*/
node.getWhere().ifPresent(where -> analyzeWhere(node, sourceScope, where));
/**
* 在处理select子句时,我们会将select中的item获取出来进行遍历,同时analyzeSelect方法的返回结果是最终输出的一组表达式List<Expression>。
* 在对selectItem进行遍历时,区分了AllColumns和SingleColumn分别进行处理,其中AllColumns对应 select * 的模式,
* SingleColumn则是具体的查询列。
*/
List<Expression> outputExpressions = analyzeSelect(node, sourceScope);
/**
* analyzeGroupBy:group by必要的语法校验,例如,
* 普通group by列必须在select列表(output列表)里,
* 同时处理特殊的group by子句,如cube函数、rollup、groupingsets
*/
GroupingSetAnalysis groupByAnalysis = analyzeGroupBy(node, sourceScope, outputExpressions);
/**
* analyzeHaving:having子句不可包含窗口函数,同时结果必须是一个boolean
*/
analyzeHaving(node, sourceScope);
Scope outputScope = computeAndAssignOutputScope(node, scope, sourceScope);
List<Expression> orderByExpressions = emptyList();
Optional<Scope> orderByScope = Optional.empty();
if (node.getOrderBy().isPresent()) {
OrderBy orderBy = node.getOrderBy().get();
orderByScope = Optional.of(computeAndAssignOrderByScope(orderBy, sourceScope, outputScope));
/**
* analyzeOrderBy:必要语法解析,若orderBy列为序数,校验序数的合法性,否则将排序的表达式使用analyzeExpression进行解析。
*/
orderByExpressions = analyzeOrderBy(node, orderBy.getSortItems(), orderByScope.get());
if (sourceScope.getOuterQueryParent().isPresent() && node.getLimit().isEmpty() && node.getOffset().isEmpty()) {
// not the root scope and ORDER BY is ineffective
analysis.markRedundantOrderBy(orderBy);
warningCollector.add(new TrinoWarning(REDUNDANT_ORDER_BY, "ORDER BY in subquery may have no effect"));
}
}
analysis.setOrderByExpressions(node, orderByExpressions);
if (node.getOffset().isPresent()) {
/**
* analyzeOffset:校验offset参数的合法性
*/
analyzeOffset(node.getOffset().get(), outputScope);
}
if (node.getLimit().isPresent()) {
/**
* analyzeLimit:校验limit子句的合法性,包括fetchfirst和普通limit
*/
boolean requiresOrderBy = analyzeLimit(node.getLimit().get(), outputScope);
if (requiresOrderBy && node.getOrderBy().isEmpty()) {
throw semanticException(MISSING_ORDER_BY, node.getLimit().get(), "FETCH FIRST WITH TIES clause requires ORDER BY");
}
}
List<Expression> sourceExpressions = new ArrayList<>();
analysis.getSelectExpressions(node).stream()
.map(SelectExpression::getExpression)
.forEach(sourceExpressions::add);
node.getHaving().ifPresent(sourceExpressions::add);
for (WindowDefinition windowDefinition : node.getWindows()) {
WindowSpecification window = windowDefinition.getWindow();
sourceExpressions.addAll(window.getPartitionBy());
getSortItemsFromOrderBy(window.getOrderBy()).stream()
.map(SortItem::getSortKey)
.forEach(sourceExpressions::add);
window.getFrame()
.map(WindowFrame::getStart)
.flatMap(FrameBound::getValue)
.ifPresent(sourceExpressions::add);
window.getFrame()
.flatMap(WindowFrame::getEnd)
.flatMap(FrameBound::getValue)
.ifPresent(sourceExpressions::add);
}
/**
* analyzeGroupingOperations:从oderby和output中抽取出来GroupingOperation的表达式???
*/
analyzeGroupingOperations(node, sourceExpressions, orderByExpressions);
/**
* analyzeAggregations:结合output和orderby对聚合进行分析???
*/
analyzeAggregations(node, sourceScope, orderByScope, groupByAnalysis, sourceExpressions, orderByExpressions);
/**
* analyzeWindowFunctions:窗口函数的合法性
*/
analyzeWindowFunctions(node, outputExpressions, orderByExpressions);
if (analysis.isAggregation(node) && node.getOrderBy().isPresent()) {
ImmutableList.Builder<Expression> aggregates = ImmutableList.<Expression>builder()
.addAll(groupByAnalysis.getOriginalExpressions())
.addAll(extractAggregateFunctions(orderByExpressions, metadata))
.addAll(extractExpressions(orderByExpressions, GroupingOperation.class));
analysis.setOrderByAggregates(node.getOrderBy().get(), aggregates.build());
}
if (node.getOrderBy().isPresent() && node.getSelect().isDistinct()) {
verifySelectDistinct(node, orderByExpressions, outputExpressions, sourceScope, orderByScope.get());
}
return outputScope;
}
2.4.3.2 visitTable
/**
* 在对From子句进行解析时,如果是单一查询,则对From节点中的Table节点进行访问visitTable,visitTable过程中会和metadata模块进行交互,
* 获取当前查询表的所有元信息,并最终构建当前QuerySpecification的一个Scope;
* 1. 解析table查询名的合法性,包括Catalog、Schema、TableName
* 2. 视图解析,若为视图则构建视图的Scope(?为什么要区分视图和表?)
* 3. 查询TableHandle确认表的存在性
* 4. 获取Table元数据信息,构建Table中的列信息,并注册column和table到analysis中
* 5. 创建Scope
*/
protected Scope visitTable(Table table, Optional<Scope> scope)
{
... //代码太多,此处省略;如果感兴趣,参考https://github.com/trinodb/trino/tree/352
return tableScope;
}
2.4.3.3 analyzeExpression(分析表达式)
分析遍历Ast的节点大致可以分为两种,一种是上层节点,比如Select, With, Limit, OrderBy,GroupBy等表示查询语句某一部分的节点,还有一种是描述前者聚合参数信息的节点表达式,表达式有很多比如 "Select " 中的 "" 会解析成 AllRow对象;
/**
* 在对表达式expression进行校验analyzeExpression的过程中,首先我们需要了解,Expression也对应到sql语法中,
* 是在AST构建过程中对于原始语法节点的遍历过程中构建出来的。
* 首先,SQL语法中会定义一个identifier规则,AstBuilder在对原始语法树进行遍历时会选择性地去构建Identifier表达式。
*/
private ExpressionAnalysis analyzeExpression(Expression expression, Scope scope)
{
return ExpressionAnalyzer.analyzeExpression(
session,
metadata,
groupProvider,
accessControl,
sqlParser,
scope,
analysis,
expression,
warningCollector,
correlationSupport);
}
2.4.4 表达式的解析
根据2.4.3小节,表达式的解析托管给 ExpressionAnalyzer.analyzeExpression方法;与StatementAnalyzer类似,ExpressionAnalyzer也使用了vistor模式遍历express树,然后校验表达式的正确性;其中 ExpressionAnalyzer.Visitor#visitIdentifier便是校验当前Identifier是否存在于Scope;(举个简单的例子,Select a,b from table,需要校验table中确实存在这两列,如果不存在则当前Scope中不会有a,b)
protected Type visitIdentifier(Identifier node, StackableAstVisitorContext<Context> context)
{
// 使用Scope对当前操作的Identifier进行校验,是否存在于当前的Scope
ResolvedField resolvedField = context.getContext().getScope().resolveField(node, QualifiedName.of(node.getValue()));
return handleResolvedField(node, resolvedField, context);
}