用PHP从零开始写一个编译器(三)

400 阅读4分钟

前面的链接:

用PHP从零开始写一个编译器(一)

用PHP从零开始写一个编译器(二)

上一节,我们讲完了词法部分的ListLexer,以及Visitor。现在我们可以看看AST,即抽象语法树(Abstract Syntax Tree)是如何构成的。

首先,我们先看其基类AstNode.

declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

namespace ByteFerry\RqlParser\AstBuilder;


use ByteFerry\RqlParser\Lexer\ListLexer;
use ByteFerry\RqlParser\Lexer\Token;

/**
 * abstract class of node of abstract syntax tree
 *
 * Class AstNode
 *
 * @package ByteFerry\RqlParser\Ast
 */
abstract class AstNode
{
    /** 
     * 这里保存操作符(即RQL中的函数名这一段)
     * @var string
     */
    protected $operator;

    /**
     *  这里保存符号(即实际的文本)
     * @var string
     */
    protected $symbol;

    /**
     * 这里保存参数部分(在语法树中,也就是子节点) 
     * keep the children nodes when load function is called .
     *
     * @var NodeInterface[]
     */
    protected $argument = [];

    /**
     * 这里保存子节点的编译结果(变量名用stage,因为,实在未想出,一个子节点的预编译结果该叫啥)
     * keep the built components
     * @var array
     */
    protected $stage = [];

    /**
     * 这里保存子节点的最终编译结果
     * keep the build result of children
     *
     * @var array
     */
    protected $output = [];

    /**
     * @param $operator
     * @param $symbol
     */
    public function __construct($operator,$symbol){
        $this->operator = $operator;
        $this->symbol = $symbol;
    }

    /**
     * @param $operator
     * @param $symbol
     * 静态实例创建函数 
     * @return \ByteFerry\RqlParser\AstBuilder\AstNode|static
     */
    public static function of($operator,$symbol){

        return new static($operator,$symbol);
    }

    /**
     * 编译子节点,代码也相当简单
     * @return void
     */
    protected function buildChildren(){
        /** @var NodeInterface $child */
        foreach($this->argument as $child){
            $this->stage[] = $child->build();
        }
    }

    /**
     * 这个load就是用ListLexer消费Token
     * @param \ByteFerry\RqlParser\Lexer\ListLexer $ListLexer
     *
     * @return int|void
     */
    public function load(ListLexer $ListLexer){

        /** @var Token $token */
        $token = $ListLexer->consume();  // 首先消费一个token
        // 接下来用for消费 	
        for(; (false !== $token); $token = $ListLexer->consume()){
            if($ListLexer->isClose()){  //如果遇到右括号,则参数结束,返回
                return $ListLexer->getNextIndex();
            }
            /** @var NodeInterface $node */
            $node = NodeVisitor::visit($token->getSymbol());  // 获取子节点类型
            $node->load($ListLexer);  // 调子节点的这个函数(树结构很好地避免了递归)

            $this->argument[] = $node;   //节点写入到参数数组中
            if($ListLexer->isClose()){  //如果遇到右括号,则参数结束,返回
                return $ListLexer->getNextIndex();
            }
        }
        return $ListLexer->getNextIndex();  // 当false===$token时返回
    }

    /**
     * @return string
     */
    public function getSymbol(){
        return $this->symbol;
    }

    /**
     * 这里也就是为要成对获取预编译结果的提供一个迭代器方法。(后面会用到)
     * @return \Generator
     */
    public function pair(){
        for($i=0,$j=count($this->stage);$i<$j;$i+=2) {
            yield [$this->stage[$i], $this->stage[$i + 1]];
        }
    }

    /**
     * 强制定义,必须要有build方法。
     * @param $ListLexer
     *
     * @return mixed
     */
    abstract public function build();
}

要继承,如果要类型判断,那一定要有接口(Interface)所以,我们定义了接口如下:

<?php
declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

namespace ByteFerry\RqlParser\AstBuilder;

use ByteFerry\RqlParser\Lexer\ListLexer;

/**
 * Interface NodeInterface
 *
 * @package ByteFerry\RqlParser\Ast
 */
interface NodeInterface
{

    /**
     * @param ListLexer $ListLexer
     *
     * @return int
     */
    public function load(ListLexer $ListLexer);

    /**
     * @return mixed
     */
    public function build();


}

这个接口不复杂,只要两个方法,load和build

那么,我们需要哪些节点类呢?在Symbols类中,我们定义了

    /**
     * mapping node type to class
     * @var array
     */
    public static $class_mapping = [
        'N_AGGREGATE' =>    Ast\AggregateNode::class,
        'N_ARRAY'=>         Ast\ArrayNode::class,
        'N_COLUMN' =>       Ast\ColumnsNode::class,
        'N_CONSTANT' =>     Ast\ConstantNode::class,
        'N_DATA' =>         Ast\DataNode::class,
        'N_FILTER' =>       Ast\FilterNode::class,
        'N_LIMIT' =>        Ast\LimitNode::class,
        'N_LOGIC' =>        Ast\LogicNode::class,
        'N_PREDICATE' =>    Ast\PredicateNode::class,
        'N_QUERY' =>        Ast\QueryNode::class,
        'N_SEARCH' =>       Ast\SearchNode::class,
        'N_SORT' =>         Ast\SortNode::class,
    ];

也就是说,一共有12个类。

一般查询,像SQL,肯定是 select 字段名列表,查哪些字段,RQL中用Columns,当然,可以简写为:cols 那么,我们来看一下这个类

declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

namespace ByteFerry\RqlParser\AstBuilder;


/**
 * Class ColumnsNode
 *
 * @package ByteFerry\RqlParser\Ast
 */
class ColumnsNode extends AstNode implements NodeInterface
{

    /**
     * 逗号分隔的字段名字符串,无需编译,但是,如果是聚合查询,则要把  groupBy 编译出来,仅此而己,所以,这里重载了buildChildren
     * @return array
     */
    protected function buildChildren(){
        $groupBy=[];
        /** @var NodeInterface $child */
        foreach($this->argument as $child){
            $this->stage[] = $child->build();
            if(($this->operator === 'aggregate')&&($child instanceof ConstantNode)){
                $groupBy[] = $child->getSymbol();
            }
        }
        return $groupBy;
    }

    /**
     *  build方法也不需要做什么,把stage中的东西拿到,在写到output中即可
     * @return mixed
     */
    public function build(){
        $groupBy = $this->buildChildren();
        $this->output['columns'] = $this->stage;
        $this->output['columns_operator'] = $this->getSymbol();
        $this->output['group_by'] = $groupBy;
        return $this->output;
    }

}

在SQL语句中,一般是要通过from指定要查哪个表,而RQL的语法,则是operator(resource_name, parameters),这里的operator就是查询操作符。在Symbols类中,我们定义了:

    /**
     * Query type mapping
     * @var array
     */
    public static $query_type_mapping = [
        'all'           =>  'Q_READ',
        'any'           =>  'Q_READ',
        'count'         =>  'Q_READ',
        'create'        =>  'Q_WRITE',
        'decrement'     =>  'Q_WRITE',
        'delete'        =>  'Q_WRITE',
        'exists'        =>  'Q_READ',
        'first'         =>  'Q_READ',
        'increment'     =>  'Q_WRITE',
        'one'           =>  'Q_READ',
        'update'        =>  'Q_WRITE',
    ];

也就是说,我们的RQL是同时支持读查询和写查询的。这些查询操作符,我想,大家应当很明白,关键是,对应的节点类是如何实现的呢?

declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

namespace ByteFerry\RqlParser\AstBuilder;

use ByteFerry\RqlParser\Lexer\Symbols;

/**
 * class of query operator node
 * 查询操作符类
 * Class QueryNode
 *
 * @package ByteFerry\RqlParser\Ast
 */
class QueryNode extends AstNode implements NodeInterface
{

    /**
     * @param $array
     * 这个方法就是把预编译结果保存下来
     * @return void
     */
    protected function addStage($array){
        foreach($array as $key => $item){
            $this->stage[$key] = $item;
        }
    }

    /**
     * @return void
     * 这里逻辑也很简单,如果不是数组,那一定是 resource 名称节点,实现的就是保存预编译结果
     */
    protected function buildChildren(){
        /** @var NodeInterface $child */
        foreach($this->argument as $child){
            $result = $child->build();
            if(!is_array($result)){
                $this->stage['resource'] = $result;
            }else{
                $this->addStage($result);
            }
        }
    }

    /**
     * @return mixed
     * 这里是把预编译整理输出。并输出查询类型(读,还是写)
     */
    public function build(){

        $this->buildChildren();
        $this->output = $this->stage;
        $this->output['operator'] = $this->operator;
        $this->output['query_type'] = Symbols::$query_type_mapping[$this->symbol]??null;
        return $this->output;
    }

}

查询时,肯定要有过滤条件,RQL中是filter,那么,过虑节点是如何实现的呢?


declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */
namespace ByteFerry\RqlParser\AstBuilder;


/**
 * Class FilterMode
 *
 * @package ByteFerry\RqlParser\Ast
 */
class FilterNode extends AstNode implements NodeInterface
{

    /**
     * @return mixed
     */
    public function build(){
        $this->buildChildren();  // 先把子节点编译完成
        $this->output = [
                $this->getSymbol() => $this->stage,  
                'paramaters' => ParamaterRegister::getInstance()->toArray(),
            ];   // 这里返回的结果是 ['操作符'=>'条件字符串','paramaters' =>参数数组]
        return $this->output;
    }

}

FilterNode为什么要这么做呢? 这是因为,过滤中包括复杂的逻辑关系,还有括号,所以,编译时,只是较验正确性。编译时则是把RQL语法改为SQL的查询条件字符串。但是,另一方面,用户传入的参数是不可信的,所以,同时也把参数读出来,用来在使用时可以Validate(数据验证)。所以,我们使用了参数注册表这个类,保存这一部分结果。

这里,对于设计模式来说,就是注册表模式。下面就是参数注册表的类的代码

declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

namespace ByteFerry\RqlParser\AstBuilder;

/**
 * Class ParamaterRegister
 *
 * It is used for keep the data of predicates for validation.
 *
 * @package ByteFerry\RqlParser\AstBuilder
 */
class ParamaterRegister
{
    /**
     * @var null | ParamaterRegister
     */
    public static $instance = null;

    /**
     * @var array
     */
    protected $container = [];

    /**
     * @return \app\libraries\AppDataRegister
     */
    public static function getInstance(){
        return self::$instance;
    }

    /**
     * @return array
     */
    public function toArray(){
        return $this->container;
    }

    /**
     * @return \ByteFerry\RqlParser\AstBuilder\ParamaterRegister|null
     */
    public static function newInstance(){
        self::$instance = new static();
        return self::$instance;
    }

    /**
     * @param $key
     * @param $value
     *
     * @return void
     */
    public function add($key,$value){
        if('null' == $value){
            $value = null;
        }
        $this->container[$key]=$value;
    }
}

代码也很简单,newInstance提前创建实例,getInstance用是获取,add添加数据toArray最终取出来。仅此而己。

到目前为此,我们可以看到,每一个节点,简单时,只要实现build方法即可,复杂时,可能要实现load, buildChildren函数。通过这样的方法,整个代码具有很强的可扩展性。

对于过滤条件,实际上就是一堆逻辑表达式的组合,那么,接下来,我们再看一下逻辑表达式的节点。

declare(strict_types=1);
/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

namespace ByteFerry\RqlParser\AstBuilder;

/**
 * Class LogicNode
 *
 * @package ByteFerry\RqlParser\Ast
 */
class LogicNode extends AstNode implements NodeInterface
{

    /**
     * @return mixed
     *  逻辑节点就是一个build函数,并且,它的方式 是,如果是not 就返回 not什么, 如果不是,则就是用‘ and ’ 或者 ' or ' 把表达式节点拼接成字符串,于是就成了SQL查询字符串了
     */
    public function build(){
        $this->buildChildren();
        if($this->operator === 'not'){
            $this->output[] =  sprintf('%s %s', $this->operator, $this->stage[0] );
            return $this->output[0];
        }
        $this->output[] = '(' .implode(')'. $this->symbol .'(', $this->stage) .')';
        return $this->output[0];
    }

}

逻辑节点就是一个build函数,并且,它的方式 是,如果是not 就返回 not什么, 如果不是,则就是用‘ and ’ 或者 ' or ' 把表达式节点拼接成字符串,于是就成了SQL查询字符串了

到此为止,我们抽象语法树部分的类也讲解了一半了。下一次,我们将会把这一部分讲完。(待续)

继续阅读:

用PHP从零开始写一个编译器(四)

用PHP从零开始写一个编译器(五)