Vue3 源码内参<三>Compiler-core 之 Parse

往期文章（都有相应的源码）

代码是参照 mini-vue 实现，有兴趣也可以看原作者的实现

前置知识

compiler-core 和 compiler-dom 都是 Vue3 中的编译器模块，但它们的功能和应用场景略有不同。

compiler-core 模块是 Vue3 编译器的核心实现，负责将模板编译为渲染函数，它包含了一些基础的编译器功能，如 AST 的生成、指令和表达式的处理、优化和代码生成等。
compiler-core 模块是可以运行在各种 JavaScript 环境下的，不仅限于浏览器环境，因此它可以用于开发基于 Vue3 的跨平台应用程序，如桌面应用程序、移动应用程序等。
compiler-dom 模块是 Vue3 编译器在浏览器环境下的实现，它扩展了 compiler-core 模块的功能，以适应浏览器环境下的特殊需求，例如对 DOM 元素的属性、事件等进行编译。
compiler-dom 模块生成的代码是直接在浏览器中执行的，因此它会生成特定于浏览器环境的代码。

总之，compiler-core 模块和 compiler-dom 模块的区别在于：前者是 Vue3 编译器的核心实现，后者是在浏览器环境下的扩展实现，主要用于将模板编译成可在浏览器中直接执行的代码。

具体操作：compiler-core 模块是将 template 通过 parse 生成 ast，然后再通过 compile 处理 ast 中的指令和表达式处理、优化和代码生成等，生成了 render 之后，再通过 patch 和 diff 去处理 vdom，生成界面 UI。

开局一张图

mini-vue 最简实现 compiler-core 中的 parse 模块

解析 {{}} 插值表达式
解析 text 文本
解析标签 <div></div>

提示下载本节源码，通过 jest debug 一步一步调试，更容易理解功能点

封装公共函数

// template 是一个字符串，解析器字符解析生成 ast 后，需要消费解析过的字符
function advanceBy(context, length) {
    context.source = context.source.slice(length);
}

// 生成器
export function baseParse(content) {
    const context = createParserContent(content)
    return createRoot(parseChildren(context, []))
}

function createRoot(children) {
    return { children, type: NodeTypes.ROOT }
}

function createParserContent(content) {
    return {
        source: content
    }
}

// 解析 template 循环的判断条件
function isEnd(context, ancestors) {
    // 1、source 有值的时候
    // 2、当遇到结束标签的时候
    const s = context.source;
    if (s.startsWith('</')) {
        for (let i = ancestors.length - 1; i >= 0; i--) {
            const tag = ancestors[i].tag;
            if (startsWithEndTagOpen(s, tag)) {
                return true;
            }
        }
    }
    return !s;
}

解析{{}}

function parseChildren(context, ancestors) {
    const nodes: any = []
    while (!isEnd(context, ancestors)) {
        let node
        const s = context.source;
        if (s.startsWith('{{')) {
            node = parseInterpolation(context)
        }
        nodes.push(node)
    }
    return nodes
}

function parseInterpolation(context) {
    // {{message}}
    // 拿出来定义的好处就是 如果需要更改 改动会很小
    const openDelimiter = '{{'
    const closeDelimiter = '}}'

    // 我们要知道关闭的位置
    // indexOf 表示 检索 }} 从 2 开始
    const closeIndex = context.source.indexOf(
        closeDelimiter,
        openDelimiter.length
    )

    // 删除 前两个字符串
    // context.source = context.source.slice(openDelimiter.length)
    advanceBy(context, openDelimiter.length)

    // 内容的长度就等于 closeIndex - openDelimiter 的长度
    const rawContentLength = closeIndex - openDelimiter.length
    const rawContent = parseTextData(context, rawContentLength)
    const content = rawContent.trim()

    // 然后还需要把这个字符串给删了 模板是一个字符串 要接着遍历后面的内容
    // context.source = context.source.slice(rawContentLength + closeDelimiter.length);
    advanceBy(context, closeDelimiter.length)

    return {
        type: NodeTypes.INTERPOLATION,
        content: {
            type: NodeTypes.SIMPLE_EXPRESSION,
            content
        }
    }
}

解析text

function parseChildren(context, ancestors) {
    const nodes: any = []
    while (!isEnd(context, ancestors)) {
        let node
        const s = context.source;
        // 解析插值
        if (s.startsWith('{{')) {
            node = parseInterpolation(context)
        }
        // 解析 text
        if (!node) {
            node = parseText(context);
        }

        nodes.push(node)
    }
    return nodes
}
function parseText(context) {
    const endToken = ['{{', '</'] // 停止的条件 如果同时存在 那么这个 index 要尽量的靠左 去最小的
    let endIndex = context.source.length // 停止的索引
    for (let i = 0; i < endToken.length; i++) {
        const index = context.source.indexOf(endToken[i])
        if (index !== -1 && endIndex > index) {
            endIndex = index
        }
    }

    // 解析文本 之前是 从头截取到尾部 但真是的环境是文本后面会有其它类型的 element 所以要指明停止的位置
    const content = parseTextData(context, endIndex)

    return {
        type: NodeTypes.TEXT,
        content
    }
}

解析标签 <div></div>

import { NodeTypes } from "./ast";

const enum TagType {
    Start,
    End
}

function parseChildren(context, ancestors) {
    const nodes: any = []
    while (!isEnd(context, ancestors)) {
        let node
        const s = context.source;
        if (s.startsWith('{{')) {...
        } else if (s[0] === "<") {
            // 需要用正则表达判断
            // <div></div>
            // /^<[a-z]/i/
            if (/[a-z]/i.test(s[1])) {
                node = parseElement(context, ancestors);
            }
        }
        nodes.push(node)
    }
    return nodes
}

function parseElement(context, ancestors) {
    // 解析标签
    const element: any = parseTag(context, TagType.Start)
    ancestors.push(element)
    // 获取完标签后 需要把内部的 元素保存起来 需要用递归的方式去遍历内部的 element
    element.children = parseChildren(context, ancestors)
    ancestors.pop()

    // 这里要判断一下 开始标签和结束标签是否是一致的 不能直接消费完就 return
    if (startsWithEndTagOpen(context.source, element.tag)) {
        parseTag(context, TagType.End)
    } else {
        throw new Error(`缺少结束标签:${element.tag}`)
    }

    return element
}

function parseTag(context, type) {
    // <div></div>
    // 匹配解析
    // 推进
    const match: any = /^<\/?([a-z]*)/i.exec(context.source)
    const tag = match[1]
    // 获取完后要推进
    advanceBy(context, match[0].length)
    advanceBy(context, 1);

    if (type === TagType.End) return
    return {
        type: NodeTypes.ELEMENT,
        tag
    }
}

function startsWithEndTagOpen(source, tag) {
    // 以左括号开头才有意义 并且 还需要转换为小写比较
    return (
        source.startsWith("</") &&
        source.slice(2, 2 + tag.length).toLowerCase() === tag.toLowerCase()
    );
}

实现步骤简要说明

source = <div></div>，并且维护一个标签进出栈 ancestors
递归的处理字符串，isEnd 函数是返回一个 boolean 来表示是否遇到了结束标签，遇到了阶段性就不需要再递归处理了字符串了。
调用 parseElement 函数和 parseTag 函数，取出标签 name
tag = div，然后字符串推进，source = </div>，ancestors = [{tag:'div'}]
然后第一次处理完后，又回到 2 处理接下来的字符，isEnd 发现是结束标签
就会遍历 ancestors 数组，倒序遍历
取出每一项比较，比较到就会 return true，就不会递归了，并且取出 ancestors 最后一个元素
判断开始标签和结束标签是否一致（pop 出来的 tag 和处理后的 source 比较 name），相同则消费字符串，不同就报错处理。

END

本文，简单实现了 compiler 中 parse 模块解析插值、文本和标签的功能。详细内容传送门，觉得有用，请给个 star。另外，笔者认为，学习源码之前，可以先简单实现一些基础功能，再去看源码。源码的边界条件很多，不利于阅读。