编译二：parse在《编译一：入口》一文中，简单分析了模板 template 编译成 render 函数的过程，涉及到三

在《编译一：入口》一文中，简单分析了模板 template 编译成 render 函数的过程，涉及到三步：parse、optimize、generate。那么，在接下来的文章中，会逐一对它们进行详细地分析；本文先来分析 parse 过程。

概述

parse 解析模板的过程相对复杂，利用正则表达式匹配模板，对匹配出的不同标签调用不同的回调函数进行处理，比如注释节点、文本类型节点、开始标签、闭合标签等，经过一系列处理后，转换成 AST（抽象语法树）。AST 可以看出是一个 JavaScript 对象，是对源代码抽象语法结构的一种映射。

将 parse 解析过程沿着主线整理出一张逻辑图，如下：

分析 `parse`

parse 解析过程相对比较复杂，为了更好地理解整个解析过程，通过一个简单的例子来分析，如下：

<template>
  <div>Parse Template</div>
</template>

经过上文分析，可得知 parse 的入口位于 src/compiler/index.js

export const createCompiler = createCompilerCreator(function baseCompile (
  template: string,
  options: CompilerOptions
): CompiledResult {
  const ast = parse(template.trim(), options)
  if (options.optimize !== false) {
    optimize(ast, options)
  }
  const code = generate(ast, options)
  return {
    ast,
    render: code.render,
    staticRenderFns: code.staticRenderFns
  }
})

回调函数 baseCompile 最终会被执行，即执行编译三步：parse、optimize、generate。那么，来看下 parse 如何定义的？

// src/compiler/parser/index.js
/**
 * Convert HTML string to AST.
 */
export function parse (
  template: string,
  options: CompilerOptions
): ASTElement | void {
  ...
  
  function warnOnce (msg, range) {}
  function closeElement (element) {}
  function trimEndingWhitespace (el) {}
  function checkRootConstraints (el) {}

  parseHTML(template, {
    ...,
    start (tag, attrs, unary, start, end) {},
    end (tag, start, end) {},
    charts (text: string, start: number, end: number) {},
    commnet (text: string, start, end) {}
  })
  
  return root
}

函数 parse 接收两个参数：

template：模板字符串

options：数据类型为 CompilerOptions，编译过程默认配置，具体如下：

// src/platforms/web/compiler/options.js
export const baseOptions: CompilerOptions = {
  expectHTML: true,
  modules,
  directives,
  isPreTag,
  isUnaryTag,
  mustUseProp,
  canBeLeftOpenTag,
  isReservedTag,
  getTagNamespace,
  staticKeys: genStaticKeys(modules)
}

那么，在函数 parse 的处理过程中，对于模板的解析，其核心逻辑是调用函数 parseHTML 将模板解析为 AST，该方法的实现逻辑相对复杂，先来看它是如何接收参数的？

函数 parseHTML 接收两个参数：

html：模板字符串 template
options：可选项对象，包含的属性有 warn、expectHTML、isUnaryTag、canBeLeftOpenTag、shouldDecodeNewlines、shouldDecodeNewlinesForHref、shouldKeepComment、outputSourceRange 以及多个函数，具体如下：
- start：处理开始标签
- end：处理结束标签
- chars：处理文本
- comment：处理注释节点

那么，下面就详细地分析函数 parseHTML 实现逻辑。

分析 parseHTML

首先，使用 while 语句循环解析模板 html，用正则表达式匹配，对匹配出的不同结果调用相应的函数进行处理，直至模板 html 被解析完毕。循环解析模板的具体实现如下：

// src/compiler/parser/html-parse.js
export function parseHTML (html, options) {
  ...
  
  while (html) {
    last = html
    // Make sure we're not in a plaintext content element like script/style
    if (!lastTag || !isPlainTextElement(lastTag)) {
      let textEnd = html.indexOf('<')
      if (textEnd === 0) {
        // Comment:
        if (comment.test(html)) {
          const commentEnd = html.indexOf('-->')

          if (commentEnd >= 0) {
            if (options.shouldKeepComment) {
              options.comment(html.substring(4, commentEnd), index, index + commentEnd + 3)
            }
            advance(commentEnd + 3)
            continue
          }
        }

        // http://en.wikipedia.org/wiki/Conditional_comment#Downlevel-revealed_conditional_comment
        if (conditionalComment.test(html)) {
          const conditionalEnd = html.indexOf(']>')

          if (conditionalEnd >= 0) {
            advance(conditionalEnd + 2)
            continue
          }
        }

        // Doctype:
        const doctypeMatch = html.match(doctype)
        if (doctypeMatch) {
          advance(doctypeMatch[0].length)
          continue
        }

        // End tag:
        const endTagMatch = html.match(endTag)
        if (endTagMatch) {
          const curIndex = index
          advance(endTagMatch[0].length)
          parseEndTag(endTagMatch[1], curIndex, index)
          continue
        }

        // Start tag:
        const startTagMatch = parseStartTag()
        if (startTagMatch) {
          handleStartTag(startTagMatch)
          if (shouldIgnoreFirstNewline(startTagMatch.tagName, html)) {
            advance(1)
          }
          continue
        }
      }

      let text, rest, next
      if (textEnd >= 0) {
        rest = html.slice(textEnd)
        while (
          !endTag.test(rest) &&
          !startTagOpen.test(rest) &&
          !comment.test(rest) &&
          !conditionalComment.test(rest)
        ) {
          // < in plain text, be forgiving and treat it as text
          next = rest.indexOf('<', 1)
          if (next < 0) break
          textEnd += next
          rest = html.slice(textEnd)
        }
        text = html.substring(0, textEnd)
      }

      if (textEnd < 0) {
        text = html
      }

      if (text) {
        advance(text.length)
      }

      if (options.chars && text) {
        options.chars(text, index - text.length, index)
      }
    } else {
      let endTagLength = 0
      const stackedTag = lastTag.toLowerCase()
      const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\s\S]*?)(</' + stackedTag + '[^>]*>)', 'i'))
      const rest = html.replace(reStackedTag, function (all, text, endTag) {
        endTagLength = endTag.length
        if (!isPlainTextElement(stackedTag) && stackedTag !== 'noscript') {
          text = text
            .replace(/<!--([\s\S]*?)-->/g, '$1') // #7298
            .replace(/<![CDATA[([\s\S]*?)]]>/g, '$1')
        }
        if (shouldIgnoreFirstNewline(stackedTag, text)) {
          text = text.slice(1)
        }
        if (options.chars) {
          options.chars(text)
        }
        return ''
      })
      index += html.length - rest.length
      html = rest
      parseEndTag(stackedTag, index - endTagLength, index)
    }

    if (html === last) {
      options.chars && options.chars(html)
      if (process.env.NODE_ENV !== 'production' && !stack.length && options.warn) {
        options.warn(`Mal-formatted tag at end of template: "${html}"`, { start: index + html.length })
      }
      break
    }
  }
  
  ...
}

以上面所举的例子，即

<template>
  <div>Parse Template</div>
</template>

来分析 parseHTML 是如何循环解析模板的。

首先，满足条件 textEnd === 0，进入该条件所在的 if 逻辑代码块；经过正则匹配，最终匹配到开始标签逻辑，即调用函数 parseStartTag 解析开始标签，具体实现如下：

// src/compiler/parser/html-parse.js

function parseStartTag () {
  const start = html.match(startTagOpen)
  if (start) {
    const match = {
      tagName: start[1],
      attrs: [],
      start: index
    }
    advance(start[0].length)
    let end, attr
    while (!(end = html.match(startTagClose)) && (attr = html.match(dynamicArgAttribute) || html.match(attribute))) {
      attr.start = index
      advance(attr[0].length)
      attr.end = index
      match.attrs.push(attr)
    }
    if (end) {
      match.unarySlash = end[1]
      advance(end[0].length)
      match.end = index
      return match
    }
  }
}

使用开始标签正则对模板进行匹配，其正则如下：

// src/compiler/parser/html-parse.js

import { unicodeRegExp } from 'core/util/lang'

const ncname = `[a-zA-Z_][\-\.0-9_a-zA-Z${unicodeRegExp.source}]*`
const qnameCapture = `((?:${ncname}\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)

得到匹配结果 start 如下：

start = [
  "<div",
  "div",
  groups: undefined,
  index: 0,
  input: "<div>Test Parse</div>",
  length: 2
]

在循环解析模板的过程中，有一个变量 index 作为光标，表示解析模板当前的位置；随着模板不断被解析，该光标会不断向前推进，直至模板解析完毕。对于控制光标 index 前进的逻辑，是由函数 advance 完成，具体实现如下：

function advance (n) {
  index += n;
  html = html.substring(n);
}

那么，回到函数 parseStartTag ，在正则匹配获得结果 start 后，调用函数 advance 移动光标，此时 index 值为 4；html 为 >Test Parse</div>。继续执行后续逻辑，此时满足 if(end) 条件，执行该代码块，再一次调用函数 advance 移动光标；那么，index 值为 5，html 值为 Test Parse</div>，最终返回 match，其结果如下：

match = {
  attrs: [],
  end: 5,
  start: 0,
  tagName: "div",
  unarySlash: ""
}

回到函数 parseHTML，调用函数 parseStartTag 返回结果赋值给 startTagMatch，此时满足 if(startTagMatch) 条件，执行该代码块，即执行函数 handleStartTag，具体实现逻辑如下：

// src/compiler/parser/html-parse.js

function handleStartTag (match) {
    var tagName = match.tagName;
    var unarySlash = match.unarySlash;

    if (expectHTML) {
      if (lastTag === 'p' && isNonPhrasingTag(tagName)) {
        parseEndTag(lastTag);
      }
      if (canBeLeftOpenTag$$1(tagName) && lastTag === tagName) {
        parseEndTag(tagName);
      }
    }

    var unary = isUnaryTag$$1(tagName) || !!unarySlash;

    var l = match.attrs.length;
    var attrs = new Array(l);
    for (var i = 0; i < l; i++) {
      var args = match.attrs[i];
      var value = args[3] || args[4] || args[5] || '';
      var shouldDecodeNewlines = tagName === 'a' && args[1] === 'href'
        ? options.shouldDecodeNewlinesForHref
        : options.shouldDecodeNewlines;
      attrs[i] = {
        name: args[1],
        value: decodeAttr(value, shouldDecodeNewlines)
      };
      if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
        attrs[i].start = args.start + args[0].match(/^\s*/).length;
        attrs[i].end = args.end;
      }
    }

    if (!unary) {
      stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs, start: match.start, end: match.end });
      lastTag = tagName;
    }

    if (options.start) {
      options.start(tagName, attrs, unary, match.start, match.end);
    }
  }

从代码实现来看，核心的逻辑是调用传入的函数 start，具体实现如下：

// src/compiler/parser/html-parse.js

start (tag, attrs, unary, start, end) {
  // check namespace.
  // inherit parent ns if there is one
  const ns = (currentParent && currentParent.ns) || platformGetTagNamespace(tag)
  
  // handle IE svg bug
  /* istanbul ignore if */
  if (isIE && ns === 'svg') {
    attrs = guardIESVGBug(attrs)
  }
  
  let element: ASTElement = createASTElement(tag, attrs, currentParent)
  if (ns) {
    element.ns = ns
  }
  
  ...
  
  if (isForbiddenTag(element) && !isServerRendering()) {
    element.forbidden = true
      ...
  }
  
  // apply pre-transforms
  for (let i = 0; i < preTransforms.length; i++) {
    element = preTransforms[i](element, options) || element
  }
    
    if (!inVPre) {
      processPre(element)
      if (element.pre) {
        inVPre = true
      }
    }
    
    if (platformIsPreTag(element.tag)) {
      inPre = true
    }
    
    if (inVPre) {
      processRawAttrs(element)
    } else if (!element.processed) {
      // structural directives
      processFor(element)
      processIf(element)
      processOnce(element)
    }
    
    if (!root) {
      root = element
      if (process.env.NODE_ENV !== 'production') {
        checkRootConstraints(root)
      }
    }
    
    if (!unary) {
      currentParent = element
      stack.push(element)
    } else {
      closeElement(element)
    }
  }

函数接收 5 个参数，分别如下：

tag：html 标签
attrs：标签属性值
unary：是否为单闭合标签
start：起始位置
end：结束位置

从代码实现中可看出，调用函数 createASTElement 创建 AST 元素 element，其结果如下：

element = {
  attrsList: [],
  attrsMap: {},
  children: [],
  parent: undefined,
  rawAttrsMap: {},
  tag: "div",
  type: 1,
  start: 0,
  end: 5
}

接着，对 AST 元素 element 设置起始位置 start、结束位置 end；并且，如果存在指令 v-pre、v-for、v-if、v-once 以及属性时，则调用对应的函数进行处理，比如 processPre、processFor、processIf、processOnce、processRawAttrs。

最后，将元素 element 设置为当前父元素 currentParent；至此，开始标签的理解处理完毕，进入下一个循环。

经过第一次循环后，此时 html值为Test Parse，进入下一轮循环，textEnd 值为 10，满足条件 textEnd >= 0 ，则进入该条件代码块。

接着，截取文本字符串 Text Parse，即 text = html.substring(0, textEnd) 。

然后，调用函数 advance 移动光标，index 值为 15；html 值为 </div>；由于满足文本的处理逻辑，调用 chars 对文本进行处理，具体实现如下：

// src/compiler/parser/html-parse.js

chars (text: string, start: number, end: number) {
  ...
  // IE textarea placeholder bug
  /* istanbul ignore if */
  if (isIE &&
      currentParent.tag === 'textarea' &&
      currentParent.attrsMap.placeholder === text
     ) {
    return
  }
  
  const children = currentParent.children
  if (inPre || text.trim()) {
    text = isTextTag(currentParent) ? text : decodeHTMLCached(text)
  } else if (!children.length) {
    // remove the whitespace-only node right after an opening tag
    text = ''
  } else if (whitespaceOption) {
    if (whitespaceOption === 'condense') {
      // in condense mode, remove the whitespace node if it contains
      // line break, otherwise condense to a single space
      text = lineBreakRE.test(text) ? '' : ' '
    } else {
      text = ' '
    }
  } else {
    text = preserveWhitespace ? ' ' : ''
  }
  
  if (text) {
    if (!inPre && whitespaceOption === 'condense') {
      // condense consecutive whitespaces into single space
      text = text.replace(whitespaceRE, ' ')
    }
    let res
    let child: ?ASTNode
    if (!inVPre && text !== ' ' && (res = parseText(text, delimiters))) {
      child = {
        type: 2,
        expression: res.expression,
        tokens: res.tokens,
        text
      }
    } else if (text !== ' ' || !children.length || children[children.length - 1].text !== ' ') {
      child = {
        type: 3,
        text
      }
    }
    if (child) {
      if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
        child.start = start
        child.end = end
      }
      children.push(child)
    }
  }
}

函数接收 3 个参数：

text：文本
start：起始位置
end：结束位置

从代码实现中可看出，需要关注的代码，比如：

const children = currentParent.children

通过第一次循环解析可知，currentParent 指向标签 <div>，作为父元素，其 children 为 []；接着，调用 parseText 解析文本，并将返回结果赋值给 res，具体实现如下：

// src/compiler/parser/text-parse.js

export function parseText (
  text: string,
  delimiters?: [string, string]
): TextParseResult | void {
  const tagRE = delimiters ? buildRegex(delimiters) : defaultTagRE
  if (!tagRE.test(text)) {
    return
  }
  const tokens = []
  const rawTokens = []
  let lastIndex = tagRE.lastIndex = 0
  let match, index, tokenValue
  while ((match = tagRE.exec(text))) {
    index = match.index
    // push text token
    if (index > lastIndex) {
      rawTokens.push(tokenValue = text.slice(lastIndex, index))
      tokens.push(JSON.stringify(tokenValue))
    }
    // tag token
    const exp = parseFilters(match[1].trim())
    tokens.push(`_s(${exp})`)
    rawTokens.push({ '@binding': exp })
    lastIndex = index + match[0].length
  }
  if (lastIndex < text.length) {
    rawTokens.push(tokenValue = text.slice(lastIndex))
    tokens.push(JSON.stringify(tokenValue))
  }
  return {
    expression: tokens.join('+'),
    tokens: rawTokens
  }
}

由于满足条件 !tagRE.test(text) ，结束程序执行，返回 undefined。回到 chars，res 值为 undefined，不满足条件，则进入 else 逻辑代码块，即

child = {
  type: 3,
  text: text
}

最终，element 值为

element = {
  attrsList: [],
  attrsMap: {},
  children: [
    {
      type: 3,
      text: text,
      start: 5,
      end: 15
    }
  ],
  parent: undefined,
  rawAttrsMap: {},
  tag: "div",
  type: 1,
  start: 0,
  end: 5
}

结束文本解析，html 值为 </div>，进入下一次循环。此时 textEnd 值为 0，满足条件 textEnd === 0 ，则进入该代码块逻辑。

经过一系列正则匹配处理，满足结束标签逻辑，endTagMatch 匹配结果如下：

endTagMatch = [
  '</div>',
  'div',
  groups: undefined,
  index: 0,
  input: '</div>'
]

调用 advance 移动光标 index，此时 index 值为 21；html 值为 ""。接着调用 parseEndTag 解析结束标签，具体实现如下：

// src/compiler/parser/html-parse.js

function parseEndTag (tagName, start, end) {
  let pos, lowerCasedTagName
  if (start == null) start = index
  if (end == null) end = index
  
  // Find the closest opened tag of the same type
  if (tagName) {
    lowerCasedTagName = tagName.toLowerCase()
    for (pos = stack.length - 1; pos >= 0; pos--) {
      if (stack[pos].lowerCasedTag === lowerCasedTagName) {
        break
      }
    }
  } else {
    // If no tag name is provided, clean shop
    pos = 0
  }
  
  if (pos >= 0) {
    // Close all the open elements, up the stack
    for (let i = stack.length - 1; i >= pos; i--) {
      if (process.env.NODE_ENV !== 'production' &&
          (i > pos || !tagName) &&
          options.warn
         ) {
        options.warn(
          `tag <${stack[i].tag}> has no matching end tag.`,
          { start: stack[i].start, end: stack[i].end }
        )
      }
      if (options.end) {
        options.end(stack[i].tag, start, end)
      }
    }
    
    // Remove the open elements from the stack
    stack.length = pos
    lastTag = pos && stack[pos - 1].tag
  } else if (lowerCasedTagName === 'br') {
    if (options.start) {
      options.start(tagName, [], true, start, end)
    }
  } else if (lowerCasedTagName === 'p') {
    if (options.start) {
      options.start(tagName, [], false, start, end)
    }
    if (options.end) {
      options.end(tagName, start, end)
    }
  }
}

函数接收 3 个参数：

tagName：标签名，值为 div
start：起始位置，值为 15
end：结束位置，值为 21

由于在解析开始标签时，已将匹配到的开始标签添加到栈中，即 stack；那么，在该函数的处理过程中，将当前标签 tagName 与 stack 栈顶元素进行匹配，如果匹配到是一对开始结束标签时，则进行执行后续逻辑；否则，则抛出告警，结束程序执行。

那么，在满足开始结束标签的条件下，则会执行 end，具体实现如下：

// src/compiler/parser/index.js
end (tag, start, end) {
  const element = stack[stack.length - 1]
  // pop stack
  stack.length -= 1
  currentParent = stack[stack.length - 1]
  if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
    element.end = end
  }
  closeElement(element)
}

函数接收 3 个参数：

tag：标签，div
start：起始位置，值为 15
end：结束位置：21

该函数的作用是先将栈顶元素保存到变量 element，然后栈顶元素出栈，重新设置当前元素 currentParent 为 stack 最后一个元素，此时值为 undefined。

接着重新设置 element 属性 end 的值，并调用 closeElement 解析元素 element，具体实现如下：

// src/packages/vue-template-compiler/browser.js

function closeElement (element) {
  trimEndingWhitespace(element);
  if (!inVPre && !element.processed) {
    element = processElement(element, options);
  }
  // tree management
  if (!stack.length && element !== root) {
    // allow root elements with v-if, v-else-if and v-else
    if (root.if && (element.elseif || element.else)) {
      {
        checkRootConstraints(element);
      }
      addIfCondition(root, {
        exp: element.elseif,
        block: element
      });
    } else {
      warnOnce(
        "Component template should contain exactly one root element. " +
        "If you are using v-if on multiple elements, " +
        "use v-else-if to chain them instead.",
        { start: element.start }
      );
    }
  }
  if (currentParent && !element.forbidden) {
    if (element.elseif || element.else) {
      processIfConditions(element, currentParent);
    } else {
      if (element.slotScope) {
        // scoped slot
        // keep it in the children list so that v-else(-if) conditions can
        // find it as the prev node.
        var name = element.slotTarget || '"default"'
        ;(currentParent.scopedSlots || (currentParent.scopedSlots = {}))[name] = element;
      }
      currentParent.children.push(element);
      element.parent = currentParent;
    }
  }
  
  // final children cleanup
  // filter out scoped slots
  element.children = element.children.filter(function (c) { return !(c).slotScope; });
  // remove trailing whitespace node again
  trimEndingWhitespace(element);
  
  // check pre state
  if (element.pre) {
    inVPre = false;
  }
  if (platformIsPreTag(element.tag)) {
    inPre = false;
  }
  // apply post-transforms
  for (var i = 0; i < postTransforms.length; i++) {
    postTransforms[i](element, options);
  }
}

先调用 trimEndingWhitespace 移除字符串末尾空格，再调用 processElement 处理 element，具体实现如下：

// src/packages/vue-template-compiler/browser.js

function processElement (element, options) {
  processKey(element);
  
  // determine whether this is a plain element after
  // removing structural attributes
  element.plain = (
    !element.key &&
    !element.scopedSlots &&
    !element.attrsList.length
  );
  
  processRef(element);
  processSlotContent(element);
  processSlotOutlet(element);
  processComponent(element);
  for (var i = 0; i < transforms.length; i++) {
    element = transforms[i](element, options) || element;
  }
  processAttrs(element);
  return element
}

函数的作用是对标签具有的属性进行处理，比如 key、ref、slot、slot-scoped、template、inline-template、is、class、style 以及其它属性，最终得到处理后的结果 element，如下：

element = {
  attrsList: [],
  attrsMap: {},
  children: [
    {
      type: 3,
      text: text,
      start: 5,
      end: 15
    }
  ],
  parent: undefined,
  rawAttrsMap: {},
  tag: "div",
  type: 1,
  start: 0,
  end: 21
}

至此，完成了第三次循环解析；由于 html 为空，那么，对于 html 循环解析逻辑则结束。

最后，调用 parseEndTag 做一些清理操作。

那么，template 终于被解析为 AST 树，完成了 parse 过程。

参考资料

parse