用PHP从零开始写一个编译器(二)

669 阅读3分钟

前面的链接:

用PHP从零开始写一个编译器(一)

上一节我们讲到了,我们把字符串转换成了Token数组,并且存进了ListLexer中了。那么,这个Token是什么样子呢?


/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

declare(strict_types=1);


namespace ByteFerry\RqlParser\Lexer;

use ByteFerry\RqlParser\Exceptions\ParseException;
/**
 * Class Token
 *
 * @package ByteFerry\RqlParser
 */
class Token
{
    /**
     * 这里存的是symbol的类型,也就是在Symbol类中定义的常量
     * symbol type
     * @var int
     */
    protected $type = 0;

    /**
     * 这里就是symbol本身的字符串了	
     * symbol content string
     * @var string
     */
    protected $symbol = '';

    /**
     * 下一个Token的类型
     * the lexer type of next node
     * @var int
     */
    protected $next_type = -1;

    /**
     * 前一个Token的类型
     * the lexer type of previous node
     * @var int
     */
    protected $previous_type = -1;

    /**
     * 这个层级参数,是用来进行语法较验的
     * @var int
     */
    protected $level = 0;

    /**
     * @param     $type
     * @param     $symbol
     * @param int $previous_type
     * 这是静态创建Token的方法,免去在代码的用new
     * @return  static
     */
    public static function from($type,$symbol,$previous_type = -1)
    {
        /**
         * ensure the syntax of the rql with simple ABNF definition
         * 我们定了一套ABNF的规则,那么,我们用它来检验语法是否正确
         */
        if(-1 !== $previous_type){
            if(!in_array($type,Symbols::$rules[$previous_type])){
                throw new ParseException('Syntex error in Node of ' .$symbol);
            }
        }


        $instance = new static();   //下面就是初始化了
        $instance->type = $type;
        $instance->symbol = $symbol;
        $instance->previous_type = $previous_type;
        return $instance;
    }

    /**
     * @param $previousType
     * RQL中有一个数据是没有函数模式的,那就是数组,这里,特别处理一下
     * @return \ByteFerry\RqlParser\Lexer\Token
     */
    public static function makeArrayToken($previousType){
        $instance = new static();
        $instance->type = Symbols::T_WORD;
        $instance->symbol = 'arr';
        $instance->previous_type = $previousType;
        return $instance;
    }

    /**
     * @param $level
     *
     * @return void
     */
    public function setLevel($level){
        $this->level = $level;
    }

    /**
     * @param $type
     *
     * @return void
     */
    public function setNextType($type)
    {
        $this->next_type = $type;
    }

    /**
     * @return int
     */
    public function getType()
    {
        return $this->type;
    }

    /**
     * @param $type
     *
     * @return void
     */
    public function setPrevType($type){
        $this->previous_type=$type;
    }

    /**
     * @return string
     */
    public function getSymbol()
    {
        return $this->symbol;
    }

    /**
     * @return bool
     */
    public function isClose(){
        return ($this->type === Symbols::T_CLOSE_PARENTHESIS);
    }

    /**
     * @return int
     */
    public function getPrevType(){
        return $this->previous_type;
    }

    /**
     * @return bool
     */
    public function isPunctuation(){
        return !( ($this->type === Symbols::T_WORD)
               || ($this->type === Symbols::T_STRING)
               );
    }

}

我们可以看到,此类中有很多单行代码的方法。其实,面向对象是一个方面。很多初学者不会利用单行代码方法函数,从而导致一些函数代码超长。

接下来,我们该看看ListLexer这个类了

/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

declare(strict_types=1);

namespace ByteFerry\RqlParser\Lexer;

use ByteFerry\RqlParser\Abstracts\BaseObject;
use ByteFerry\RqlParser\Exceptions\ParseException;


/**
 * Class TokenList
 *
 * @package ByteFerry\RqlParser\ListLexer
 */
class ListLexer extends BaseObject
{
    /**
     * 就是在这里保存的token数组
     * @var array
     */
    protected $items = [];

    /**
     * @var int
     */
    protected $level = 0;

    /**
     * @var int
     */
    protected $position = 0;



    /**
     * @param $token
     *
     * @return void
     */
    public function addItem(Token $token){
        if($token->getType() === Symbols::T_OPEN_PARENTHESIS){
            /**
             * for < ,( >  that is the array operator,
             * we'd insert a node 'arr'
             * 逗号后的无函数名节点,即直接是左括号时,一定是一个数组节点
             */
            if($token->getPrevType()===Symbols::T_COMMA){
                $this->items[$this->position++] = Token::makeArrayToken(Symbols::T_COMMA);  // 所以,用Token的makeArrayToken
            }
            $token->setPrevType(Symbols::T_WORD);  //虚拟出来的arr函数,也要把前一节点类型设置为 T_WORD
            $this->level++; //同时,增加一个层数
        }
        if($token->getType() === Symbols::T_CLOSE_PARENTHESIS){
            $this->level--;    //当遇到右括号是,层数减掉。(当括号如果匹配,最后,level应当是0,这就是这个校验算法的核心。至于为什么,你自己想吧)
        }
        $token->setLevel($this->level);
        $this->items[$this->position++] = $token;
    }

    /**
     * @param $type
     * 这是设置前一节点的NextType,
     * @return void
     */
    public function setNextType($type){
        if(isset($this->items[$this->position-2])){ // 用当前的指针减2,是因为,加上了新的以后,还没有更新position
            $this->items[$this->position-2]->setNextType($type);
        }
    }

    /**
     * @return mixed
     */
    public function current(){
        return $this->items[$this->position];
    }

    /**
     * @return bool|mixed
     * 这里是Token的消费, 关键的函数
     */
    public function consume(){
        /**
         * if got the end we must return;  
         */
        if($this->isEnd()){
            return false;   // 判断是否结束
        }
        /**
         * get the next token
         */
        $token = $this->items[++$this->position]; 取下一个token
        /**
         * we only consume the word or string.  
         * 仅消费 word 或 string类型的token,所以,我们调用了token的isPunctuation
         */
        for(; $token->isPunctuation() && !$this->isEnd(); $token = $this->items[++$this->position]){
            /**
             * if we meet the close flag we must return.
             */
            if($token->isClose()){
                return $token;
            }
        }
        return $token;
    }


    /**
     * @return mixed
     */
    public function rewind()
    {
        $this->position = 0;
        return $this->items[$this->position];
    }

    /**
     *
     * @return int
     */
    public function getNextIndex()
    {
        return ++$this->position;
    }


    /**
     * @return mixed
     */
    public function isClose(){
        return $this->items[$this->position]->isClose();
    }

    /**
     * @return bool
     */
    public function isEnd(){
        return $this->position+1 >= count($this->items);
    }

    /**
     * @return int
     */
    public function getLevel(){
        return $this->level;
    }

}

我们发现,Token类中一些单行函数,简化了这里的代码。同样,此类中也有一些单行函数简化了consume函数中的代码,所以,代码行就少多了。 到 这里,词法部分都结束了。接下来就是抽象语法树部分了。我们继续看NodeVisitor


<?php

/*
 * This file is part of the ByteFerry/Rql-Parser package.
 *
 * (c) BardoQi <67158925@qq.com>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */

declare(strict_types=1);

namespace ByteFerry\RqlParser\AstBuilder;

use ByteFerry\RqlParser\Exceptions\ParseException;
use ByteFerry\RqlParser\Lexer\Symbols;

/**
 * Class NodeVisitor
 *
 * @package ByteFerry\RqlParser\Ast
 */
class NodeVisitor
{

    /**
     * @param $name
     *
     * @return mixed
     */
    protected static function fromAlias($name)
    {
        return Symbols::$type_alias[$name]??$name;
    }

    /**
     * @param $operator
     *
     * @return mixed
     */
    protected static function getNodeType($operator)
    {
        return Symbols::$type_mappings[$operator]??null;
    }

    /**
     * @param $node_type
     *
     * @return mixed|null
     */
    protected static function getClass($node_type)
    {
        return Symbols::$class_mapping[$node_type]??Symbols::$class_mapping['N_CONSTANT'];
    }


    /**
     * @param $symbol
     *
     * @return \ByteFerry\RqlParser\AstBuilder\NodeInterface;
     */
    public static function visit($symbol){
        $operator = self::fromAlias($symbol);
        $node_type = self::getNodeType($operator);
        $node_class = self::getClass($node_type);
        if(null === $node_class){
            throw new ParseException('Node class of ' .$node_type.' not found!');
        }
        return $node_class::of($operator,$symbol);
    }
}

代码相当简单,到这里,我们发现,它其实并不是真正的访问者模式,只是拿了一个symbol,获取到一个实例,仅此而己。 接下来,我们就要理解其抽象语法树当中的内容了。(待续)

继续阅读:

用PHP从零开始写一个编译器(三)

用PHP从零开始写一个编译器(四)

用PHP从零开始写一个编译器(五)