前言
今天我们来讲讲,Vue的complier怎么把模板转换成ast对象的。
比如这个字符串:
<div>
{{ 123 }}
<p>456</p>
</div>
通过Vue的模板解析后变成这样一个ast结构
[
{
type: 'tag',
tag: 'div',
children: [
{
type: 'interpolation',
content: ' 123 '
},
{
type: 'tag',
tag: 'p',
children: [
{
type: 'text',
content: '456'
}
]
}
]
}
]
其实Vue的模板解析逻辑呢,和单纯的html字符串转树节点结构逻辑是非常接近的,只是会多了一些Vue的插值、属性绑定、监听绑定等额外的逻辑,Vue源码里面里面也是采用了htmlparser2这个库的解析逻辑。通过实现tokenizer和parser去得到这么一个树结构。
Tokenizer
tokenizer采用的 有限状态机 的逻辑去解析模板字符串;
- 比如
<div123</div>从左往右,要么是开始标签状态,要么是文本状态,要么是结束状态,不可能没有状态。 - 一开始为开始状态,然后切换到文本状态或者其他状态。即
我们一定是在某一前提条件下,由某一状态切换到另一个状态。。
比如上面这段代码,我们的解析过程可以分解为
- 解析
<:由初始状态进入标签开始状态 - 解析
div:由标签开始状态进入标签名称状态 - 解析
>:由标签名称状态进入初始状态 - 解析
123:由初始状态进入文本状态 - 解析
<:由文本状态进入标签开始状态 - 解析
/:由标签开始状态进入结束标签状态 - 解析
div:由结束标签状态进入结束标签名称状态 - 解析
>:由结束标签名称状态进入初始状态
初步实现代码
/**
* step1
* 分析状态机应该包含的状态
*/
enum State {
Text,
BeforeTagName,
InTagName,
InClosingTagName,
}
export class Tokenizer {
private index = 0;
private input = '';
private state = State.Text;
/**
* step3
* 每个状态之间的切换逻辑
*/
private stateText(s: string) {
if (s === '<') {
this.state = State.BeforeTagName;
}
}
private stateBeforeTagName(s: string) {
if (s === '/') {
this.state = State.InClosingTagName;
} else {
this.state = State.InTagName;
}
}
private stateInTagName(s: string) {
if (s === '>') {
this.state = State.Text;
}
}
private stateInClosingTagName(s: string) {
if (s === '>') {
this.state = State.Text;
}
}
public parse(input: string) {
this.input = input;
while (this.index < this.input.length) {
const s = this.input[this.index];
/**
* step2
* 每个状态下不同的case
*/
switch (this.state) {
case State.Text:
console.log('进入文本')
this.stateText(s);
break;
case State.BeforeTagName:
console.log('进入标签')
this.stateBeforeTagName(s);
break;
case State.InTagName:
console.log('开始标签')
this.stateInTagName(s);
break;
case State.InClosingTagName:
console.log('结束标签')
this.stateInClosingTagName(s);
break;
}
this.index++;
}
}
}
new Tokenizer().parse('<div>123</div>');
Parser
其实tokenzier里面就可以完成解析逻辑了,但是为了更好的拓展,单独把标签、内容等逻辑放到了parser逻辑内,其实也很简单,在tokenzier内去加入一些钩子即可完成这个工作:
enum State {
Text,
BeforeTagName,
InTagName,
InClosingTagName,
}
export class Tokenizer {
private index = 0;
private input = '';
private state = State.Text;
private sectionStart = 0;
constructor(
/**
* step1
* 定义下收集点,这里源码里面采用的的htmlparser2的方案,采用了配置项回调方法的方式处理,更好拓展
*/
private cbs: {
ontext(start: number, end: number): void;
onopentagname(start: number, end: number): void;
onclosetag(start: number, end: number): void;
},
) {}
private stateText(s: string) {
if (s === '<') {
/**
* step2
* 要记录内容,要有内容起始记录点sectionStart,不管是文本还是标签名都有是多个要记录起始字符串
*/
if (this.sectionStart < this.index) {
this.cbs.ontext(this.sectionStart, this.index);
}
this.state = State.BeforeTagName;
}
}
private stateBeforeTagName(s: string) {
if (s === '/') {
this.state = State.InClosingTagName;
this.sectionStart = this.index + 1;
} else {
this.state = State.InTagName;
/**
* step3-1
* 记录标签名起始点
*/
this.sectionStart = this.index;
}
}
private stateInTagName(s: string) {
if (s === '>') {
/**
* step3
* 记录开始标签,这里要取标签名所以要记录起始点,去到step3-1,记录完后重置下sectionStart,因为接下来就重置为文本状态了,会影响文本起始点
*/
this.cbs.onopentagname(this.sectionStart, this.index);
this.sectionStart = this.index + 1;
this.state = State.Text;
}
}
private stateInClosingTagName(s: string) {
if (s === '>') {
/**
* step4
* 记录关闭标签,和step3一样的逻辑
*/
this.cbs.onclosetag(this.sectionStart, this.index);
this.sectionStart = this.index + 1;
this.state = State.Text;
}
}
public parse(input: string) {
this.input = input;
while (this.index < this.input.length) {
const s = this.input[this.index];
switch (this.state) {
case State.Text:
this.stateText(s);
break;
case State.BeforeTagName:
this.stateBeforeTagName(s);
break;
case State.InTagName:
this.stateInTagName(s);
break;
case State.InClosingTagName:
this.stateInClosingTagName(s);
break;
}
this.index++;
}
}
}
/**
* step5
* 新建一个独立的parser方法
*/
(function parser(input: string) {
/**
* step6
* 收集每个节点, 简单定义下节点类型,运行打印下
*/
const nodes: Array<{ type: 'text'|'tagStart'|'tagEnd'; content?: string; tag?: string }> = [];
new Tokenizer({
ontext(start, end) {
nodes.push({
type: 'text',
content: input.slice(start, end),
});
},
onopentagname(start, end) {
nodes.push({
type: 'tagStart',
tag: input.slice(start, end),
});
},
onclosetag(start, end) {
nodes.push({
type: 'tagEnd',
tag: input.slice(start, end),
});
},
}).parse(input);
console.log(nodes);
})('<div>123</div>');
这样简单的收集下标签和文本,就可以得到一个平级token
[
{ type: 'tagStart', tag: 'div' },
{ type: 'text', content: '123' },
{ type: 'tagEnd', tag: 'div' }
]
状态机到这里其实已经完成他的使命了,但是我们想得到的是ast结构啊,是一个树状的,有层级的结构。要完成这一转换,要引入递归下降解析器概念。
看看这个图:
看下栈数据变化 这是实际的栈的变化过程
// 初始状态
[
{
"type": "tag",
"tag": "root",
"children": []
}
]
// 遇到<div>开始标签
[
{
"type": "tag",
"tag": "div",
"children": []
},
{
"type": "tag",
"tag": "root",
"children": [
{
"type": "tag",
"tag": "div",
"children": []
}
]
}
]
// 遇到<p>开始标签
[
{
"type": "tag",
"tag": "p",
"children": []
},
{
"type": "tag",
"tag": "div",
"children": [
{
"type": "tag",
"tag": "p",
"children": []
}
]
},
{
"type": "tag",
"tag": "root",
"children": [
{
"type": "tag",
"tag": "div",
"children": [
{
"type": "tag",
"tag": "p",
"children": []
}
]
}
]
}
]
// 遇到</p>闭合标签
[
{
"type": "tag",
"tag": "div",
"children": [
{
"type": "tag",
"tag": "p",
"children": [
{
"type": "text",
"content": "456"
}
]
}
]
},
{
"type": "tag",
"tag": "root",
"children": [
{
"type": "tag",
"tag": "div",
"children": [
{
"type": "tag",
"tag": "p",
"children": [
{
"type": "text",
"content": "456"
}
]
}
]
}
]
}
]
// 遇到</div>闭合标签
[
{
"type": "tag",
"tag": "root",
"children": [
{
"type": "tag",
"tag": "div",
"children": [
{
"type": "tag",
"tag": "p",
"children": [
{
"type": "text",
"content": "456"
}
]
}
]
}
]
}
]
实现代码:
enum State {
Text,
BeforeTagName,
InTagName,
InClosingTagName,
}
export class Tokenizer {
private index = 0;
private input = '';
private state = State.Text;
private sectionStart = 0;
constructor(
private cbs: {
ontext(start: number, end: number): void;
onopentagname(start: number, end: number): void;
onclosetag(start: number, end: number): void;
},
) {}
private stateText(s: string) {
if (s === '<') {
if (this.sectionStart < this.index) {
this.cbs.ontext(this.sectionStart, this.index);
}
this.state = State.BeforeTagName;
}
}
private stateBeforeTagName(s: string) {
if (s === '/') {
this.state = State.InClosingTagName;
this.sectionStart = this.index + 1;
} else {
this.state = State.InTagName;
this.sectionStart = this.index;
}
}
private stateInTagName(s: string) {
if (s === '>') {
this.cbs.onopentagname(this.sectionStart, this.index);
this.sectionStart = this.index + 1;
this.state = State.Text;
}
}
private stateInClosingTagName(s: string) {
if (s === '>') {
this.cbs.onclosetag(this.sectionStart, this.index);
this.sectionStart = this.index + 1;
this.state = State.Text;
}
}
private finish() {
if (this.state === State.Text && this.sectionStart < this.index) {
this.cbs.ontext(this.sectionStart, this.index);
}
}
public parse(input: string) {
this.input = input;
while (this.index < this.input.length) {
const s = this.input[this.index];
switch (this.state) {
case State.Text:
this.stateText(s);
break;
case State.BeforeTagName:
this.stateBeforeTagName(s);
break;
case State.InTagName:
this.stateInTagName(s);
break;
case State.InClosingTagName:
this.stateInClosingTagName(s);
break;
}
this.index++;
}
this.finish();
}
}
type ElementNode = {
type: 'tag' | 'text';
tag?: string;
content?: string;
children?: ElementNode[];
};
(function parser(input: string) {
const stack: ElementNode[] = [{ type: 'tag', tag: 'root', children: [] }];
/**
* step2
* 初始打印
*/
console.log('html:', input, '\n');
console.log(JSON.stringify(stack, null, 2), '\n');
new Tokenizer({
ontext(start, end) {
stack[0].children!.push({
type: 'text',
content: input.slice(start, end),
});
},
onopentagname(start, end) {
const currentTag: ElementNode = {
type: 'tag',
tag: input.slice(start, end),
children: [],
};
stack[0].children!.push(currentTag);
stack.unshift(currentTag);
/**
* step3
* 进栈打印
*/
console.log(JSON.stringify(stack, null, 2), '\n');
},
onclosetag() {
stack.shift();
/**
* step4
* 出栈打印
*/
console.log(JSON.stringify(stack, null, 2), '\n');
},
}).parse(input);
/**
* step1
* 简化下节点结构
*/
})('<div><p>456</p></div>');
到这里差不多结束了,突然发现没有Vue模板的逻辑里面啊,确实啊,但是呢既然原理我们都了解了,这还不手到擒来,以插值为例,只需要在状态机内增加对应的插值状态还有钩子,就可以把插值加到ast内了:
enum State {
Text,
BeforeTagName,
InTagName,
InClosingTagName,
/**
* step2
* 增加插值状态
*/
Interpolation,
}
export class Tokenizer {
private index = 0;
private input = '';
private state = State.Text;
private sectionStart = 0;
constructor(
private cbs: {
ontext(start: number, end: number): void;
onopentagname(start: number, end: number): void;
onclosetag(start: number, end: number): void;
/**
* step5
* 增加插值结束回调
*/
oninterpolation(start: number, end: number): void;
},
) {}
private stateText(s: string) {
if (s === '<') {
if (this.sectionStart < this.index) {
this.cbs.ontext(this.sectionStart, this.index);
}
this.state = State.BeforeTagName;
/**
* step3
* 插值在这里增加一个识别节点,这里是两个字符的识别,所以光标整体往后移一位
*/
} else if (s === '{' && this.input[this.index + 1] === '{') {
/**
* step7
* 遇到插值的时候把前面的字符收集一下
*/
if (this.index > this.sectionStart) {
this.cbs.ontext(this.sectionStart, this.index);
}
this.state = State.Interpolation;
this.index ++;
this.sectionStart = this.index + 1;
}
}
private stateBeforeTagName(s: string) {
if (s === '/') {
this.state = State.InClosingTagName;
this.sectionStart = this.index + 1;
} else {
this.state = State.InTagName;
this.sectionStart = this.index;
}
}
private stateInTagName(s: string) {
if (s === '>') {
this.cbs.onopentagname(this.sectionStart, this.index);
this.sectionStart = this.index + 1;
this.state = State.Text;
}
}
private stateInClosingTagName(s: string) {
if (s === '>') {
this.cbs.onclosetag(this.sectionStart, this.index);
this.sectionStart = this.index + 1;
this.state = State.Text;
}
}
/**
* step4
* 插值状态下等待闭合插值符号
*/
private stateInterpolation(s: string) {
if (s === '}' && this.input[this.index + 1] === '}') {
this.cbs.oninterpolation(this.sectionStart, this.index);
this.state = State.Text;
this.index++;
this.sectionStart = this.index + 1;
}
}
private finish() {
if (this.state === State.Text && this.sectionStart < this.index) {
this.cbs.ontext(this.sectionStart, this.index);
}
}
public parse(input: string) {
this.input = input;
while (this.index < this.input.length) {
const s = this.input[this.index];
switch (this.state) {
case State.Text:
this.stateText(s);
break;
case State.BeforeTagName:
this.stateBeforeTagName(s);
break;
case State.InTagName:
this.stateInTagName(s);
break;
case State.InClosingTagName:
this.stateInClosingTagName(s);
break;
case State.Interpolation:
this.stateInterpolation(s);
break;
}
this.index++;
}
this.finish();
}
}
type ElementNode = {
type: 'tag' | 'text' | 'interpolation';
tag?: string;
content?: string;
children?: ElementNode[];
};
(function parser(input: string) {
const stack: ElementNode[] = [{ type: 'tag', tag: 'root', children: [] }];
new Tokenizer({
ontext(start, end) {
stack[0].children!.push({
type: 'text',
content: input.slice(start, end),
});
},
onopentagname(start, end) {
const currentTag: ElementNode = {
type: 'tag',
tag: input.slice(start, end),
children: [],
};
stack[0].children!.push(currentTag);
stack.unshift(currentTag);
},
onclosetag() {
stack.shift();
},
/**
* step6
* 回调的时候收集一下插值内容
*/
oninterpolation(start, end) {
stack[0].children!.push({
type: 'interpolation',
content: input.slice(start, end),
});
},
}).parse(input);
console.log(JSON.stringify(stack, null, 2));
/**
* step1
* 增加插值字符处理
*/
})('<div>0{{123}}<p>456</p></div>789');
结语
到这里基本就结束了,当然实际场景中还会有不少的额外场景还有边界情况,比如:自闭合标签的处理、注释处理......
能看完完整的代码的你现在强的可怕。。。