简易分布式计算系统设计｜青训营笔记这是我参与「第四届青训营」笔记创作活动的第21天记录一下关于项目中SQL解析与

这是我参与「第四届青训营」笔记创作活动的第21天

记录一下关于项目中SQL解析与验证部分的学习笔记

SQL解析与验证

parser

Parser是将输入文本转换为AST（抽象语法树），parser有包括两个部分，Parser和Lexer，其中Lexer实现词法分析，Parser实现语法分析。

AST

AST是abstract syntax tree的缩写，也就是抽象语法树。和所有的Parser一样，Druid Parser会生成一个抽象语法树。

Lexer&Yacc

Lexer:Lex会生成一个叫做『词法分析器』的程序。这是一个函数，它带有一个字符流传入参数，词法分析器函数看到一组字符就会去匹配一个关键字(key)，采取相应措施。

{
#include <stdio.h>
}
stop printf("Stop command received\n");
start printf("Start command received\n");

编译需执行以下命令：

lex example1.l
cc lex.yy –o example –ll

Yacc:用来为编译器解析输入数据，即程序代码。还可以解析输入流中的标识符(token)。

==这两个组件共同构成了 Parser 模块，调用 Parser，可以将文本解析成结构化数据，也就是AST（抽象语法树）==

session.go 699:     return s.parser.Parse(sql, charset, collation)

session.go 699:     return s.parser.Parse(sql, charset, collation)

在解析过程中，会先用 lexer 不断地将文本转换成 token，交付给 Parser，Parser 是根据 yacc 语法生成，根据语法不断的决定 Lexer 中发来的 token 序列可以匹配哪条语法规则，最终输出结构化的节点。例如对于这样一条语句 SELECT * FROM t WHERE c > 1;，可以匹配 SelectStmt 的规则，被转换成下面这样一个数据结构：

 type SelectStmt struct {
        dmlNode
        resultSetNode
    
        // SelectStmtOpts wraps around select hints and switches.
        *SelectStmtOpts
        // Distinct represents whether the select has distinct option.
        Distinct bool
        // From is the from clause of the query.
        From *TableRefsClause
        // Where is the where clause in select statement.
        Where ExprNode
        // Fields is the select expression list.
        Fields *FieldList
        // GroupBy is the group by expression list.
        GroupBy *GroupByClause
        // Having is the having condition.
        Having *HavingClause
        // OrderBy is the ordering expression list.
        OrderBy *OrderByClause
        // Limit is the limit clause.
        Limit *Limit
        // LockTp is the lock type
        LockTp SelectLockType
        // TableHints represents the level Optimizer Hint
        TableHints []*TableOptimizerHint
    }

大部分 ast 包中的数据结构，都实现了 ast.Node接口，这个接口有一个 Accept方法，后续对 AST 的处理，主要依赖 Accept 方法，以 Visitor 模式遍历所有的节点以及对 AST 做结构转换。

制定查询计划及其优化

得到AST之后，就可以对其进行各种验证、变化、以及优化，可通过如下语句进行操作：

session.go 805:             stmt, err := compiler.Compile(goCtx, stmtNode)

进入Compile函数后，有三个重要的步骤：

plan.Preprocess: 做一些合法性检查以及名字绑定；
plan.Optimize：制定查询计划，并优化，这个是最核心的步骤之一，后面的文章会重点介绍；
构造 executor.ExecStmt结构：这个 ExecStmt 结构持有查询计划，是后续执行的基础，非常重要，特别是 Exec 这个方法。

生成执行器

首先我们要提取出执行器的接口，定义出执行方法、事务获取和相应提交、回滚、关闭的定义，同时由于执行器是一种标准的执行过程，所以可以由抽象类进行实现，对过程内容进行模板模式的过程包装。在包装过程中定义抽象类，由具体的子类来实现。
之后是对 SQL 的处理，在执行 SQL 的时候，分为了简单处理和预处理，预处理中包括准备语句、参数化传递、执行查询，以及最后的结果封装和返回。

具体代码：

executor/adpter.go 227:  e, err := a.buildExecutor(ctx)

生成执行器之后，封装在一个 recordSet结构中：

return &recordSet{
            executor:    e,
            stmt:        a,
            processinfo: pi,
            txnStartTS:  ctx.Txn().StartTS(),
        }, nil

运行执行器

TiDB 的执行引擎是以 Volcano 模型运行，所有的物理 Executor 构成一个树状结构，每一层通过调用下一层的 Next/NextChunk() 方法获取结果。

这里的 rs即为一个 RecordSet接口，对其不断的调用 Next()，拿到更多结果，返回给 MySQL Client。第二类语句是 Insert 这种不需要返回数据的语句，只需要把语句执行完成即可。这类语句也是通过 Next驱动执行，驱动点在构造 recordSet结构之前：

// If the executor doesn't return any result to the client, we execute it without delay.
        if e.Schema().Len() == 0 {
            return a.handleNoDelayExecutor(goCtx, e, ctx, pi)
        } else if proj, ok := e.(*ProjectionExec); ok && proj.calculateNoDelay {
            // Currently this is only for the "DO" statement. Take "DO 1, @a=2;" as an example:
            // the Projection has two expressions and two columns in the schema, but we should
            // not return the result of the two expressions.
            return a.handleNoDelayExecutor(goCtx, e, ctx, pi)
        }

用Druid SQL Parser解析SQL

Druid SQL Parser分三个模块：Parser，AST，Visitor。

在Druid Parser中可以通过如下方式生成AST：

final String dbType = JdbcConstants.MYSQL; // 可以是ORACLE、POSTGRESQL、SQLSERVER、ODPS等
String sql = "select * from t";
// SQLStatement就是AST
List<SQLStatement> stmtList = SQLUtils.parseStatements(sql, dbType);

在使用过程中，需加入依赖：

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid</artifactId>
    <version>1.2.6</version>
    <scope>test</scope>
</dependency>

简易分布式计算系统设计 ｜ 青训营笔记

SQL解析与验证

简易分布式计算系统设计｜青训营笔记