1.总体介绍
上接:Presto - Coordinator 查询流程-3,上章节讲解了Ast重写、校验过程,在校验完毕之后会生成一个Analysis对象,本章将会讲解在analysis对象基础上构建逻辑计划;
2.入口
Coordinator 查询流程-1中的DispatchManager#createQueryInternal,提交dispatchQuery;
resourceGroupManager.submit(dispatchQuery, selectionContext, dispatchExecutor);
继续查看ResourceGroupManager#submit
public void submit(ManagedQueryExecution queryExecution, SelectionContext<C> selectionContext, Executor executor)
{
checkState(configurationManager.get() != null, "configurationManager not set");
createGroupIfNecessary(selectionContext, executor);
//从map中获取InternalResourceGroup,调用run
groups.get(selectionContext.getResourceGroupId()).run(queryExecution);
}
继续查看InternalResourceGroup#run,当判断可以run时候调用startInBackground
public void run(ManagedQueryExecution query)
{
synchronized (root) {
...
// 不能跑也不能排队,则直接报错
if (!canQueue && !canRun) {
query.fail(new QueryQueueFullException(id));
return;
}
// 能run
if (canRun) {
startInBackground(query);
}
// 否则进入排队
else {
enqueueQuery(query);
}
...
}
}
继续查看InternalResourceGroup#startInBackground
private void startInBackground(ManagedQueryExecution query)
{
checkState(Thread.holdsLock(root), "Must hold lock to start a query");
synchronized (root) {
...
executor.execute(query::startWaitingForResources);
}
}
给executor提交任务,执行LocalDispatchQuery#startWaitingForResources等待资源执行;
public void startWaitingForResources()
{
if (stateMachine.transitionToWaitingForResources()) {
waitForMinimumWorkers();
}
}
继续执行LocalDispatchQuery#waitForMinimumWorkers,会等待足够workers,触发startExecution执行
private void waitForMinimumWorkers()
{
// wait for query execution to finish construction
addSuccessCallback(queryExecutionFuture, queryExecution -> {
ListenableFuture<?> minimumWorkerFuture = clusterSizeMonitor.waitForMinimumWorkers(executionMinCount, getRequiredWorkersMaxWait(session));
/**
* when worker requirement is met, start the execution
* 当达空闲的workers数量达到要求了,startExecution(queryExecution),使用开始执行剩下的查询
*/
addSuccessCallback(minimumWorkerFuture, () -> startExecution(queryExecution));
...
});
}
继续执行LocalDispatchQuery#startExecution,将使用QuerySubmitter提交Query查询任务;这个querySubmitter就是SqlQueryManager#createQuery
private void startExecution(QueryExecution queryExecution)
{
queryExecutor.execute(() -> {
if (stateMachine.transitionToDispatching()) {
... 其他代码
querySubmitter.accept(queryExecution);
}
});
}
继续查看SqlQueryManager#createQuery
public void createQuery(QueryExecution queryExecution)
{
... 其他无关代码
queryExecution.start();
}
最终找到查询执行的入口在QueryExecution#start
3. 查询的执行
3.1 QueryExecution执行的流程
如下代码所示,QueryExecution执行流程如下:
- 构建逻辑执行计划(包括构建和优化)
- 构建分布式执行计划
- 构建执行计划调度 SqlQueryScheduler 对象
- scheduler.start()开始调度
public void start()
{
... 无关代码
// 构建逻辑执行计划
PlanRoot plan = planQuery();
// 构建分布式逻辑执行计划, DynamicFilterService needs plan for query to be registered.
// Query should be registered before dynamic filter suppliers are requested in distribution planning.
registerDynamicFilteringQuery(plan);
// 执行计划分布式处理
planDistribution(plan);
if (!stateMachine.transitionToStarting()) {
// query already started or finished
return;
}
// if query is not finished, start the scheduler, otherwise cancel it
SqlQueryScheduler scheduler = queryScheduler.get();
if (!stateMachine.isDone()) {
scheduler.start();
}
}
3.2 计划查看构建逻辑执行计划
private PlanRoot planQuery()
{
return doPlanQuery();
}
private PlanRoot doPlanQuery()
{
// plan query
LogicalPlanner logicalPlanner = new LogicalPlanner(stateMachine.getSession(),
planOptimizers,
idAllocator,
metadata,
typeOperators,
new TypeAnalyzer(sqlParser, metadata),
statsCalculator,
costCalculator,
stateMachine.getWarningCollector());
Plan plan = logicalPlanner.plan(analysis);
// fragment the plan
SubPlan fragmentedPlan = planFragmenter.createSubPlans(stateMachine.getSession(), plan, false, stateMachine.getWarningCollector());
// 提取Query级别的inputs和outputs
List<Input> inputs = new InputExtractor(metadata, stateMachine.getSession()).extractInputs(fragmentedPlan);
stateMachine.setInputs(inputs);
stateMachine.setOutput(analysis.getTarget());
boolean explainAnalyze = analysis.getStatement() instanceof Explain && ((Explain) analysis.getStatement()).isAnalyze();
return new PlanRoot(fragmentedPlan, !explainAnalyze);
}
3.3 LogicalPlanner#plan
主要流程是:
- 构建逻辑计划PlanNode
- optimizer优化逻辑计划
- 创建Plan(root, types, statsAndCosts)
public Plan plan(Analysis analysis, Stage stage, boolean collectPlanStatistics)
{ // 构建初始的逻辑的计划
PlanNode root = planStatement(analysis, analysis.getStatement());
// 检验器校验逻辑计划,有很多的checker去check
planSanityChecker.validateIntermediatePlan(root, session, metadata, typeOperators, typeAnalyzer, symbolAllocator.getTypes(), warningCollector);
if (stage.ordinal() >= OPTIMIZED.ordinal()) {
for (PlanOptimizer optimizer : planOptimizers) { // 优化
root = optimizer.optimize(root, session, symbolAllocator.getTypes(), symbolAllocator, idAllocator, warningCollector);
requireNonNull(root, format("%s returned a null plan", optimizer.getClass().getName()));
}
}
if (stage.ordinal() >= OPTIMIZED_AND_VALIDATED.ordinal()) {
// make sure we produce a valid plan after optimizations run. This is mainly to catch programming errors, 对优化后的逻辑执行计划进行验证, 保证优化过程中没有产生编程错误
planSanityChecker.validateFinalPlan(root, session, metadata, typeOperators, typeAnalyzer, symbolAllocator.getTypes(), warningCollector);
}
TypeProvider types = symbolAllocator.getTypes();
StatsAndCosts statsAndCosts = StatsAndCosts.empty();
if (collectPlanStatistics) {
StatsProvider statsProvider = new CachingStatsProvider(statsCalculator, session, types);
CostProvider costProvider = new CachingCostProvider(costCalculator, statsProvider, Optional.empty(), session, types);
statsAndCosts = StatsAndCosts.create(root, statsProvider, costProvider);
}
return new Plan(root, types, statsAndCosts); // 搜集查询过程中的数据类型和cost(explain)
}
##3.4 LogicalPlanner#planStatement
public PlanNode planStatement(Analysis analysis, Statement statement)
{
// create as select,不关心
if ((statement instanceof CreateTableAsSelect && analysis.getCreate().get().isCreateTableAsSelectNoOp()) ||
statement instanceof RefreshMaterializedView && analysis.isSkipMaterializedViewRefresh()) {
....
}
return createOutputPlan(planStatementWithoutOutput(analysis, statement), analysis);
}
继续查看createOutputPlan和planStatementWithoutOutput 发现流程是这样的:
- 1 通过planStatementWithoutOutput生成逻辑计划root:PlanNode
- 1.1 使用RelationPlanner.process真实的构造RelationPlan(RelationPlan其实PlanNode的扩展,主要用途在RelationPlanner处理的时候增加Plan其他参数;
- 2 在root的基础上封装OutputNode(outputFields,outputSymbol)
// 在执行计划的最上层封装一个固定的OutputNode
private PlanNode createOutputPlan(RelationPlan plan, Analysis analysis)
{
ImmutableList.Builder<Symbol> outputs = ImmutableList.builder();
ImmutableList.Builder<String> names = ImmutableList.builder();
int columnNumber = 0;
RelationType outputDescriptor = analysis.getOutputDescriptor();
for (Field field : outputDescriptor.getVisibleFields()) {
String name = field.getName().orElse("_col" + columnNumber);
names.add(name);
int fieldIndex = outputDescriptor.indexOf(field);
Symbol symbol = plan.getSymbol(fieldIndex);
outputs.add(symbol);
columnNumber++;
}
// 构建OutputNode,根据root:PlanNode
return new OutputNode(idAllocator.getNextId(), plan.getRoot(), names.build(), outputs.build());
}
private RelationPlan planStatementWithoutOutput(Analysis analysis, Statement statement)
{
... 其他情况
if (statement instanceof Query) {
return createRelationPlan(analysis, (Query) statement);
}
}
private RelationPlan createRelationPlan(Analysis analysis, Query query)
{
return new RelationPlanner(analysis, symbolAllocator, idAllocator, buildLambdaDeclarationToSymbolMap(analysis, symbolAllocator), metadata, Optional.empty(), session, ImmutableMap.of())
.process(query, null);
}
3.5 RelationPlanner、QueryPlanner、SubQueryPlanner
Presto使用RelationPlanner、QueryPlanner、SubQueryPlanner共同构造逻辑计划;主要讨论是通过递归遍历Query:Statment;遇到对应的节点**,调用对应的visit** 方法;
RelationPlanner并不负责所有AST节点的转换,而是只负责Relation相关节点的转换, 包括Table、AliasedRelation、SampleRelation、Join、TableSubQuery、Query、QuerySpecification、 Values、Unnest、Union、Intersect、Except。 对于类似Query、QuerySpecification等节点,具有一个整体封装的关系型查询子节点, RelationPlanner会委托给QueryPlanner进行处理;
QueryPlanner负责Query相关的部分构建;比如QuerySpecification,Query;
SubQueryPlanner专门用户处理子SubQuery的部分;处理的方式就是构造新的RelationPlanner进行处理
3.5.1 RelationPlanner#visitTable
从analysis中根据Table获取namedQuery,若namedQuery不为空,先递归处理namedQuery(类型为Query)构建subPlan(RealtionPlan),再处理type后作为RelationPlan返回,若namedQuery为空,直接构建TableScanNode节点,转为RelationPlan返回
protected RelationPlan visitTable(Table node, Void context)
{
Query namedQuery = analysis.getNamedQuery(node);
Scope scope = analysis.getScope(node);
RelationPlan plan;
if (namedQuery != null) {
RelationPlan subPlan;
if (analysis.isExpandableQuery(namedQuery)) {
subPlan = new QueryPlanner(analysis, symbolAllocator, idAllocator, lambdaDeclarationToSymbolMap, metadata, outerContext, session, recursiveSubqueries)
.planExpand(namedQuery);
}
else {
subPlan = process(namedQuery, null);
}
// Add implicit coercions if view query produces types that don't match the declared output types
// of the view (e.g., if the underlying tables referenced by the view changed)
List<Type> types = analysis.getOutputDescriptor(node)
.getAllFields().stream()
.map(Field::getType)
.collect(toImmutableList());
NodeAndMappings coerced = coerce(subPlan, types, symbolAllocator, idAllocator);
plan = new RelationPlan(coerced.getNode(), scope, coerced.getFields(), outerContext);
}
else {
TableHandle handle = analysis.getTableHandle(node);
ImmutableList.Builder<Symbol> outputSymbolsBuilder = ImmutableList.builder();
ImmutableMap.Builder<Symbol, ColumnHandle> columns = ImmutableMap.builder();
for (Field field : scope.getRelationType().getAllFields()) {
Symbol symbol = symbolAllocator.newSymbol(field);
outputSymbolsBuilder.add(symbol);
columns.put(symbol, analysis.getColumn(field));
}
List<Symbol> outputSymbols = outputSymbolsBuilder.build();
boolean isDeleteTarget = analysis.isDeleteTarget(node);
PlanNode root = TableScanNode.newInstance(idAllocator.getNextId(), handle, outputSymbols, columns.build(), isDeleteTarget);
plan = new RelationPlan(root, scope, outputSymbols, outerContext);
}
plan = addRowFilters(node, plan);
plan = addColumnMasks(node, plan);
return plan;
}
3.5.2 RelationPlanner#visitJoin
先递归处理join的left节点,获取一个RelationPlan,判断join的right节点,是否是一个Unnest节点,Unnest节点单独处理,是否是一个Lateral节点,Lateral节点同样单独处理(这里不做深究)若right节点不是Unnest和Lateral节点,那么递归处理right节点,获取一个RelationPlan,经过joinType的判断、输出列元信息的处理之后,构建一个JoinNode,并封装为RealtionPlan返回
protected RelationPlan visitJoin(Join node, Void context)
{
... 代码省略,太多了,有兴趣可以自己看
}
3.5.2 RelationPlanner#visitQuerySpecificication
构建查询的主体,委托给QueryPlanner执行
protected RelationPlan visitQuerySpecification(QuerySpecification node, Void context)
{
return new QueryPlanner(analysis, symbolAllocator, idAllocator, lambdaDeclarationToSymbolMap, metadata, outerContext, session, recursiveSubqueries)
.plan(node);
}
继续查看QueryPlanner#plan,可以发现这里构建查询各个组成部分,filter,aggregate, window, project, orderby, distinct, sort, offset, limit
public RelationPlan plan(QuerySpecification node)
{
PlanBuilder builder = planFrom(node);
builder = filter(builder, analysis.getWhere(node), node);
builder = aggregate(builder, node);
builder = filter(builder, analysis.getHaving(node), node);
builder = window(node, builder, ImmutableList.copyOf(analysis.getWindowFunctions(node)));
List<SelectExpression> selectExpressions = analysis.getSelectExpressions(node);
List<Expression> expressions = selectExpressions.stream()
.map(SelectExpression::getExpression)
.collect(toImmutableList());
builder = subqueryPlanner.handleSubqueries(builder, expressions, node);
if (hasExpressionsToUnfold(selectExpressions)) {
// pre-project the folded expressions to preserve any non-deterministic semantics of functions that might be referenced
builder = builder.appendProjections(expressions, symbolAllocator, idAllocator);
}
List<Expression> outputs = outputExpressions(selectExpressions);
if (node.getOrderBy().isPresent()) {
// ORDER BY requires outputs of SELECT to be visible.
// For queries with aggregation, it also requires grouping keys and translated aggregations.
if (analysis.isAggregation(node)) {
// Add projections for aggregations required by ORDER BY. After this step, grouping keys and translated
// aggregations are visible.
List<Expression> orderByAggregates = analysis.getOrderByAggregates(node.getOrderBy().get());
builder = builder.appendProjections(orderByAggregates, symbolAllocator, idAllocator);
}
// Add projections for the outputs of SELECT, but stack them on top of the ones from the FROM clause so both are visible
// when resolving the ORDER BY clause.
builder = builder.appendProjections(outputs, symbolAllocator, idAllocator);
// The new scope is the composite of the fields from the FROM and SELECT clause (local nested scopes). Fields from the bottom of
// the scope stack need to be placed first to match the expected layout for nested scopes.
List<Symbol> newFields = new ArrayList<>();
newFields.addAll(builder.getTranslations().getFieldSymbols());
outputs.stream()
.map(builder::translate)
.forEach(newFields::add);
builder = builder.withScope(analysis.getScope(node.getOrderBy().get()), newFields);
builder = window(node, builder, ImmutableList.copyOf(analysis.getOrderByWindowFunctions(node.getOrderBy().get())));
}
List<Expression> orderBy = analysis.getOrderByExpressions(node);
builder = subqueryPlanner.handleSubqueries(builder, orderBy, node);
builder = builder.appendProjections(Iterables.concat(orderBy, outputs), symbolAllocator, idAllocator);
builder = distinct(builder, node, outputs);
Optional<OrderingScheme> orderingScheme = orderingScheme(builder, node.getOrderBy(), analysis.getOrderByExpressions(node));
builder = sort(builder, orderingScheme);
builder = offset(builder, node.getOffset());
builder = limit(builder, node.getLimit(), orderingScheme);
builder = builder.appendProjections(outputs, symbolAllocator, idAllocator);
return new RelationPlan(
builder.getRoot(),
analysis.getScope(node),
computeOutputs(builder, outputs),
outerContext);
}
至此,逻辑计划便构建完毕,之后进入逻辑计划的优化;