Vue源码解析之模板解析器今天我们来讲讲，Vue的complier怎么把模板转换成ast对象的。其实Vue的模板解析逻辑

前言

今天我们来讲讲，Vue的complier怎么把模板转换成ast对象的。比如这个字符串：

<div>
    {{ 123 }}
    <p>456</p>
</div>

通过Vue的模板解析后变成这样一个ast结构

[
    {
        type: 'tag',
        tag: 'div',
        children: [
            {
                type: 'interpolation',
                content: ' 123 '
            },
            {
                type: 'tag',
                tag: 'p',
                children: [
                    {
                        type: 'text',
                        content: '456'
                    }
                ]
            }
        ]
    }
]

其实Vue的模板解析逻辑呢，和单纯的html字符串转树节点结构逻辑是非常接近的，只是会多了一些Vue的插值、属性绑定、监听绑定等额外的逻辑，Vue源码里面里面也是采用了htmlparser2这个库的解析逻辑。通过实现tokenizer和parser去得到这么一个树结构。

Tokenizer

tokenizer采用的 有限状态机 的逻辑去解析模板字符串；

比如 <div123</div> 从左往右，要么是 开始标签状态，要么是 文本状态，要么是 结束状态，不可能没有状态。
一开始为开始状态，然后切换到文本状态或者其他状态。即 我们一定是在某一前提条件下，由某一状态切换到另一个状态。。

比如上面这段代码，我们的解析过程可以分解为

解析 < ：由 初始状态 进入 标签开始状态
解析 div ：由 标签开始状态 进入 标签名称状态
解析 > ：由 标签名称状态 进入 初始状态
解析 123 ：由 初始状态 进入 文本状态
解析 < ：由 文本状态 进入 标签开始状态
解析 / ：由 标签开始状态 进入 结束标签状态
解析 div ：由 结束标签状态 进入 结束标签名称状态
解析 > ：由 结束标签名称状态 进入 初始状态

初步实现代码

/**
 * step1
 * 分析状态机应该包含的状态
 */
enum State {
  Text,
  BeforeTagName,
  InTagName,
  InClosingTagName,
}
export class Tokenizer {
  private index = 0;
  private input = '';
  private state = State.Text;

  /**
   * step3
   * 每个状态之间的切换逻辑
   */
  private stateText(s: string) {
    if (s === '<') {
      this.state = State.BeforeTagName;
    }
  }
  private stateBeforeTagName(s: string) {
    if (s === '/') {
      this.state = State.InClosingTagName;
    } else {
      this.state = State.InTagName;
    }
  }
  private stateInTagName(s: string) {
    if (s === '>') {
      this.state = State.Text;
    }
  }
  private stateInClosingTagName(s: string) {
    if (s === '>') {
      this.state = State.Text;
    }
  }
  public parse(input: string) {
    this.input = input;
    while (this.index < this.input.length) {
      const s = this.input[this.index];
      /**
       * step2
       * 每个状态下不同的case
       */
      switch (this.state) {
        case State.Text:
          console.log('进入文本')
          this.stateText(s);
          break;
        case State.BeforeTagName:
          console.log('进入标签')
          this.stateBeforeTagName(s);
          break;
        case State.InTagName:
          console.log('开始标签')
          this.stateInTagName(s);
          break;
        case State.InClosingTagName:
          console.log('结束标签')
          this.stateInClosingTagName(s);
          break;
      }
      this.index++;
    }
  }
}
new Tokenizer().parse('<div>123</div>');

Parser

其实tokenzier里面就可以完成解析逻辑了，但是为了更好的拓展，单独把标签、内容等逻辑放到了parser逻辑内，其实也很简单，在tokenzier内去加入一些钩子即可完成这个工作：

enum State {
  Text,
  BeforeTagName,
  InTagName,
  InClosingTagName,
}

export class Tokenizer {
  private index = 0;
  private input = '';
  private state = State.Text;
  private sectionStart = 0;
  constructor(
    /**
     * step1
     * 定义下收集点，这里源码里面采用的的htmlparser2的方案，采用了配置项回调方法的方式处理，更好拓展
     */
    private cbs: {
      ontext(start: number, end: number): void;
      onopentagname(start: number, end: number): void;
      onclosetag(start: number, end: number): void;
    },
  ) {}

  private stateText(s: string) {
    if (s === '<') {
      /**
       * step2
       * 要记录内容，要有内容起始记录点sectionStart，不管是文本还是标签名都有是多个要记录起始字符串
       */
      if (this.sectionStart < this.index) {
        this.cbs.ontext(this.sectionStart, this.index);
      }
      this.state = State.BeforeTagName;
    }
  }
  private stateBeforeTagName(s: string) {
    if (s === '/') {
      this.state = State.InClosingTagName;
      this.sectionStart = this.index + 1;
    } else {
      this.state = State.InTagName;
      /**
       * step3-1
       * 记录标签名起始点
       */
      this.sectionStart = this.index;
    }
  }
  private stateInTagName(s: string) {
    if (s === '>') {
      /**
       * step3
       * 记录开始标签，这里要取标签名所以要记录起始点，去到step3-1，记录完后重置下sectionStart，因为接下来就重置为文本状态了，会影响文本起始点
       */
      this.cbs.onopentagname(this.sectionStart, this.index);
      this.sectionStart = this.index + 1;
      this.state = State.Text;
    }
  }
  private stateInClosingTagName(s: string) {
    if (s === '>') {
      /**
       * step4
       * 记录关闭标签，和step3一样的逻辑
       */
      this.cbs.onclosetag(this.sectionStart, this.index);
      this.sectionStart = this.index + 1;
      this.state = State.Text;
    }
  }
  public parse(input: string) {
    this.input = input;

    while (this.index < this.input.length) {
      const s = this.input[this.index];
      switch (this.state) {
        case State.Text:
          this.stateText(s);
          break;
        case State.BeforeTagName:
          this.stateBeforeTagName(s);
          break;
        case State.InTagName:
          this.stateInTagName(s);
          break;
        case State.InClosingTagName:
          this.stateInClosingTagName(s);
          break;
      }
      this.index++;
    }
  }
}
/**
 * step5
 * 新建一个独立的parser方法
 */
(function parser(input: string) {
  /**
   * step6
   * 收集每个节点, 简单定义下节点类型，运行打印下
   */
  const nodes: Array<{ type: 'text'|'tagStart'|'tagEnd'; content?: string; tag?: string }> = [];
  new Tokenizer({
    ontext(start, end) {
      nodes.push({
        type: 'text',
        content: input.slice(start, end),
      });
    },
    onopentagname(start, end) {
      nodes.push({
        type: 'tagStart',
        tag: input.slice(start, end),
      });
    },
    onclosetag(start, end) {
      nodes.push({
        type: 'tagEnd',
        tag: input.slice(start, end),
      });
    },
  }).parse(input);
  console.log(nodes);
})('<div>123</div>');

这样简单的收集下标签和文本，就可以得到一个平级token

[
  { type: 'tagStart', tag: 'div' },
  { type: 'text', content: '123' },
  { type: 'tagEnd', tag: 'div' }
]

状态机到这里其实已经完成他的使命了，但是我们想得到的是ast结构啊，是一个树状的，有层级的结构。要完成这一转换，要引入递归下降解析器概念。看看这个图：

无标题-2024-07-24-1507.png

看下栈数据变化这是实际的栈的变化过程

// 初始状态
[
  {
    "type": "tag",
    "tag": "root",
    "children": []
  }
] 
// 遇到<div>开始标签
[
  {
    "type": "tag",
    "tag": "div",
    "children": []
  },
  {
    "type": "tag",
    "tag": "root",
    "children": [
      {
        "type": "tag",
        "tag": "div",
        "children": []
      }
    ]
  }
] 
// 遇到<p>开始标签
[
  {
    "type": "tag",
    "tag": "p",
    "children": []
  },
  {
    "type": "tag",
    "tag": "div",
    "children": [
      {
        "type": "tag",
        "tag": "p",
        "children": []
      }
    ]
  },
  {
    "type": "tag",
    "tag": "root",
    "children": [
      {
        "type": "tag",
        "tag": "div",
        "children": [
          {
            "type": "tag",
            "tag": "p",
            "children": []
          }
        ]
      }
    ]
  }
]
// 遇到</p>闭合标签
[
  {
    "type": "tag",
    "tag": "div",
    "children": [
      {
        "type": "tag",
        "tag": "p",
        "children": [
          {
            "type": "text",
            "content": "456"
          }
        ]
      }
    ]
  },
  {
    "type": "tag",
    "tag": "root",
    "children": [
      {
        "type": "tag",
        "tag": "div",
        "children": [
          {
            "type": "tag",
            "tag": "p",
            "children": [
              {
                "type": "text",
                "content": "456"
              }
            ]
          }
        ]
      }
    ]
  }
]
// 遇到</div>闭合标签
[
  {
    "type": "tag",
    "tag": "root",
    "children": [
      {
        "type": "tag",
        "tag": "div",
        "children": [
          {
            "type": "tag",
            "tag": "p",
            "children": [
              {
                "type": "text",
                "content": "456"
              }
            ]
          }
        ]
      }
    ]
  }
]

实现代码：

enum State {
  Text,
  BeforeTagName,
  InTagName,
  InClosingTagName,
}

export class Tokenizer {
  private index = 0;
  private input = '';
  private state = State.Text;
  private sectionStart = 0;
  constructor(
    private cbs: {
      ontext(start: number, end: number): void;
      onopentagname(start: number, end: number): void;
      onclosetag(start: number, end: number): void;
    },
  ) {}

  private stateText(s: string) {
    if (s === '<') {
      if (this.sectionStart < this.index) {
        this.cbs.ontext(this.sectionStart, this.index);
      }
      this.state = State.BeforeTagName;
    }
  }
  private stateBeforeTagName(s: string) {
    if (s === '/') {
      this.state = State.InClosingTagName;
      this.sectionStart = this.index + 1;
    } else {
      this.state = State.InTagName;
      this.sectionStart = this.index;
    }
  }
  private stateInTagName(s: string) {
    if (s === '>') {
      this.cbs.onopentagname(this.sectionStart, this.index);
      this.sectionStart = this.index + 1;
      this.state = State.Text;
    }
  }
  private stateInClosingTagName(s: string) {
    if (s === '>') {
      this.cbs.onclosetag(this.sectionStart, this.index);
      this.sectionStart = this.index + 1;
      this.state = State.Text;
    }
  }

  private finish() {
    if (this.state === State.Text && this.sectionStart < this.index) {
      this.cbs.ontext(this.sectionStart, this.index);
    }
  }
  public parse(input: string) {
    this.input = input;
    while (this.index < this.input.length) {
      const s = this.input[this.index];
      switch (this.state) {
        case State.Text:
          this.stateText(s);
          break;
        case State.BeforeTagName:
          this.stateBeforeTagName(s);
          break;
        case State.InTagName:
          this.stateInTagName(s);
          break;
        case State.InClosingTagName:
          this.stateInClosingTagName(s);
          break;
      }
      this.index++;
    }
    this.finish();
  }
}

type ElementNode = {
  type: 'tag' | 'text';
  tag?: string;
  content?: string;
  children?: ElementNode[];
};
(function parser(input: string) {
  const stack: ElementNode[] = [{ type: 'tag', tag: 'root', children: [] }];
  /**
   * step2
   * 初始打印
   */
  console.log('html：', input, '\n');
  console.log(JSON.stringify(stack, null, 2), '\n');
  new Tokenizer({
    ontext(start, end) {
      stack[0].children!.push({
        type: 'text',
        content: input.slice(start, end),
      });
    },
    onopentagname(start, end) {
      const currentTag: ElementNode = {
        type: 'tag',
        tag: input.slice(start, end),
        children: [],
      };
      stack[0].children!.push(currentTag);
      stack.unshift(currentTag);
      /**
       * step3
       * 进栈打印
       */
      console.log(JSON.stringify(stack, null, 2), '\n');
    },
    onclosetag() {
      stack.shift();
      /**
       * step4
       * 出栈打印
       */
      console.log(JSON.stringify(stack, null, 2), '\n');
    },
  }).parse(input);
  /**
   * step1
   * 简化下节点结构
   */
})('<div><p>456</p></div>');

到这里差不多结束了，突然发现没有Vue模板的逻辑里面啊，确实啊，但是呢既然原理我们都了解了，这还不手到擒来，以插值为例，只需要在状态机内增加对应的插值状态还有钩子，就可以把插值加到ast内了：

enum State {
  Text,
  BeforeTagName,
  InTagName,
  InClosingTagName,
  /**
   * step2
   * 增加插值状态
   */
  Interpolation,
}

export class Tokenizer {
  private index = 0;
  private input = '';
  private state = State.Text;
  private sectionStart = 0;
  constructor(
    private cbs: {
      ontext(start: number, end: number): void;
      onopentagname(start: number, end: number): void;
      onclosetag(start: number, end: number): void;
      /**
       * step5
       * 增加插值结束回调
       */
      oninterpolation(start: number, end: number): void;
    },
  ) {}

  private stateText(s: string) {
    if (s === '<') {
      if (this.sectionStart < this.index) {
        this.cbs.ontext(this.sectionStart, this.index);
      }
      this.state = State.BeforeTagName;
      /**
       * step3
       * 插值在这里增加一个识别节点，这里是两个字符的识别，所以光标整体往后移一位
       */
    } else if (s === '{' && this.input[this.index + 1] === '{') {
      /**
       * step7
       * 遇到插值的时候把前面的字符收集一下
      */
      if (this.index > this.sectionStart) {
        this.cbs.ontext(this.sectionStart, this.index);
      }

      this.state = State.Interpolation;
      this.index ++;
      this.sectionStart = this.index + 1;
    }
  }
  private stateBeforeTagName(s: string) {
    if (s === '/') {
      this.state = State.InClosingTagName;
      this.sectionStart = this.index + 1;
    } else {
      this.state = State.InTagName;
      this.sectionStart = this.index;
    }
  }
  private stateInTagName(s: string) {
    if (s === '>') {
      this.cbs.onopentagname(this.sectionStart, this.index);
      this.sectionStart = this.index + 1;
      this.state = State.Text;
    }
  }
  private stateInClosingTagName(s: string) {
    if (s === '>') {
      this.cbs.onclosetag(this.sectionStart, this.index);
      this.sectionStart = this.index + 1;
      this.state = State.Text;
    }
  }

  /**
   * step4
   * 插值状态下等待闭合插值符号
   */
  private stateInterpolation(s: string) {
    if (s === '}' && this.input[this.index + 1] === '}') {
      this.cbs.oninterpolation(this.sectionStart, this.index);
      this.state = State.Text;
      this.index++;
      this.sectionStart = this.index + 1;
    }
  }

  private finish() {
    if (this.state === State.Text && this.sectionStart < this.index) {
      this.cbs.ontext(this.sectionStart, this.index);
    }
  }
  public parse(input: string) {
    this.input = input;
    while (this.index < this.input.length) {
      const s = this.input[this.index];
      switch (this.state) {
        case State.Text:
          this.stateText(s);
          break;
        case State.BeforeTagName:
          this.stateBeforeTagName(s);
          break;
        case State.InTagName:
          this.stateInTagName(s);
          break;
        case State.InClosingTagName:
          this.stateInClosingTagName(s);
          break;
        case State.Interpolation:
          this.stateInterpolation(s);
          break;
      }
      this.index++;
    }
   
    this.finish();
  }
}

type ElementNode = {
  type: 'tag' | 'text' | 'interpolation';
  tag?: string;
  content?: string;
  children?: ElementNode[];
};
(function parser(input: string) {
  const stack: ElementNode[] = [{ type: 'tag', tag: 'root', children: [] }];
  new Tokenizer({
    ontext(start, end) {
      stack[0].children!.push({
        type: 'text',
        content: input.slice(start, end),
      });
    },
    onopentagname(start, end) {
      const currentTag: ElementNode = {
        type: 'tag',
        tag: input.slice(start, end),
        children: [],
      };
      stack[0].children!.push(currentTag);
      stack.unshift(currentTag);
    },
    onclosetag() {
      stack.shift();
    },
    /**
     * step6
     * 回调的时候收集一下插值内容
    */
   oninterpolation(start, end) {
      stack[0].children!.push({
        type: 'interpolation',
        content: input.slice(start, end),
      });
    },
  }).parse(input);
  
  console.log(JSON.stringify(stack, null, 2));
  /**
   * step1
   * 增加插值字符处理
   */
})('<div>0{{123}}<p>456</p></div>789');

结语

到这里基本就结束了，当然实际场景中还会有不少的额外场景还有边界情况，比如：自闭合标签的处理、注释处理......

能看完完整的代码的你现在强的可怕。。。