生成AST(三)-代码说明

html = '<div id="app"><span disabled="false">{{message}}</span></div>'

在parseHTML函数中，首先进行了while循环处理，等循环处理之后执行了parseEndTag函数来清除掉多余的标签字符串。剩下的都是一些函数声明，所以最核心的逻辑还是在while循环中。

逻辑流程图

drawio

前置了解

parseHTML函数

parseHTML函数是声明在compiler/parser/html-parser.js中，但是实际调用是在compiler/parser/index.js中
函数的入参包含了一些配置项和四个函数start，end，chars，comment
1. start用于处理开始标签逻辑
2. end用于处理结束标签逻辑
3. chars用于处理文本字符串逻辑
4. comment用于处理注释信息
为什么调用入参函数而不是直接在声明函数里面处理这些逻辑，我理解应该是为了代码的可扩展性，后面如果不需要处理这些逻辑直接将入参函数去掉就可以

process.env.NODE_ENV !== 'production'

我个人认为，在代码中有这样条件判断的，都属于一些规则的校验。比如我判断你这个标签不合规，我在开发环境就检查处理，然后给你报一个提示。所以包含这样的逻辑可以直接忽略（个人理解和习惯）

stack变量

读了代码之后你会发现，声明parseHTML文件compiler/parser/html-parser.js 和调用parseHTML文件compiler/parser/index.js中，都有一个变量stack。

一个声明在parseHTML函数内部作为局部变量出现。一个是全局声明。
一个存储的是标签信息（较少），另一个存储的是生成的AST元素。
parseHTML函数内部stack：主要用户处理开始结束标签逻辑而存在。调用parseHTML函数的stack：主要是作为生成AST元素的容器而存在。
两者的栈内容变化过程基本都一致，开始-->空 检测到开始标签--->进栈 检测到结束标签--->出栈 结束--->空

dom的type值

在 DOM（文档对象模型）中，节点对象的 type 属性表示节点的类型。以下是常见的 DOM 节点类型：

元素节点type值1（Element）：表示 HTML 元素，如 <div>, <p>, <a> 等。
文本节点type值3（Text）：表示文本内容，如标签中的文本内容或纯文本节点。
注释节点type值8（Comment）：表示注释内容，如 。
文档节点type值9（Document）：表示整个 HTML 文档，即根节点。
文档类型节点type值10（DocumentType）：表示文档类型声明，如 <!DOCTYPE html>。
属性节点type值2（Attribute）：表示元素节点的属性，如 class, id, src 等。

①循环判断

if (!lastTag || !isPlainTextElement(lastTag)) {
    ....
}

lastTag 是记录上一次解析到的标签，赋值是在handleStartTag函数里面，也就是每次拿取到开始标签就记录

!lastTag 也就是第一次开始循环，lastTag不存在

isPlainTextElement(lastTag) 这个函数是判断lastTag 是否是纯文本元素，像script，textarea，style标签里面一般都是纯文本元素，!isPlainTextElement(lastTag) 也就是不是纯文本元素，在本文的html字符串中div和span都不属于纯文本元素。

②解析是否存在<

let textEnd = html.indexOf('<')

判断html中是否存在<，这样有几种情况

存在，且在开头的位置，如：<div>111</div>
存在，但是不在开头的位置，如：测试</div>
不存在，即为字符串已经循环到把所有标签去掉，只剩文本

⑤存在<的几种情况

if (textEnd === 0) {
...
}

< 出现在字符串开头的位置，也有几种情况

是注释代码，如 
是条件注释。条件注释仅在特定的浏览器中有效，并不属于标准的 HTML 注释。它们主要用于旧版本的 Internet Explorer 浏览器中，用于针对该浏览器应用特定的样式或脚本

 <!--[if IE]>
    <p>This content is only visible in Internet Explorer.</p>
 <![endif]-->

DOCTYPE 声明。<!DOCTYPE html> 是 HTML 文档类型声明，用于告知浏览器当前文档使用的是 HTML 的哪个版本，确保浏览器正确解析和渲染 HTML 文档，并为开发者提供参考和验证文档结构的依据。在 HTML5 中，推荐使用作为文档类型声明
结束标签，如</span></div> 原有html字符串已经被截取到只剩下结束标签
开始标签，如
测试

⑦处理<开头的特殊情况

        // 注释
        if (comment.test(html)) {
          const commentEnd = html.indexOf('-->')
          if (commentEnd >= 0) {
            if (options.shouldKeepComment) {
              options.comment(html.substring(4, commentEnd), index, index + commentEnd + 3)
            }
            advance(commentEnd + 3)
            continue
          }
        }

        // 条件注释
        if (conditionalComment.test(html)) {
          const conditionalEnd = html.indexOf(']>')
          if (conditionalEnd >= 0) {
            advance(conditionalEnd + 2)
            continue
          }
        }

        // Doctype
        const doctypeMatch = html.match(doctype)
        if (doctypeMatch) {
          advance(doctypeMatch[0].length)
          continue
        }

看代码可以看出来，对于这三种特殊情况，基本上都是直接忽略（使用advance函数直接截取），除了注释这边根据参数判断需要保留注释，如果保留则根据参数函数options.comment处理。

重点⑧处理<开头的开始标签

        // Start tag:开始标签
        const startTagMatch = parseStartTag()// 判断是否是开始标签
        if (startTagMatch) {
          handleStartTag(startTagMatch) // 处理开始标签
          if (shouldIgnoreFirstNewline(startTagMatch.tagName, html)) { advance(1) }
          continue
        }

8.1 解析开始标签parseStartTag

处理当前开始标签的所有配置属性对象，包含标签的名称，标签属性，以及开始结束位置索引。这一逻辑走完之后，html的开始标签字符串也会被截取到。

  // 解析开始标签
  function parseStartTag () {
    const start = html.match(startTagOpen)
    if (start) {
      const match = { tagName: start[1], attrs: [],  start: index  }
      // 截取掉标签
      advance(start[0].length)
      let end, attr
      // 在没有匹配到标签结束>时，判断是没有标签属性或者动态属性
      // 如果有属性或者动态属性，将属性配置放到attr里面
      while (!(end = html.match(startTagClose)) && (attr = html.match(dynamicArgAttribute) || html.match(attribute))) {
        attr.start = index
        advance(attr[0].length)
        attr.end = index
        match.attrs.push(attr)
      }
      // 如果存在结束标签符>，将match整个作为开始标签的配置项返回
      if (end) {
        match.unarySlash = end[1] // 是否时自闭合标签
        advance(end[0].length)
        match.end = index
        return match
      }
    }
  }

8.2 解析开始标签handleStartTag

这个函数主要处理了自闭合标签的逻辑，标签属性键值处理，以及入参函数start的执行。

expectHTML 和isUnaryTag(tagName) 都来源于options入参中的配置项。

在这里，函数内部stack入栈

  // 处理开始标签
  function handleStartTag (match) {
    const tagName = match.tagName
    const unarySlash = match.unarySlash
    // 是否处于 HTML 解析模式
    if (expectHTML) {
      // 在解析 <p> 标签时，如果遇到一个非短语级标签（isNonPhrasingTag），则需要先闭合前一个 <p> 标签。
      // 这是因为在 HTML 规范中，<p> 标签不能包含块级元素，所以遇到非短语级标签时需要先闭合前一个 <p> 标签
      if (lastTag === 'p' && isNonPhrasingTag(tagName)) {
        parseEndTag(lastTag)
      }
      // 如果当前标签是可以自闭合的标签（canBeLeftOpenTag），且与前一个标签类型相同（lastTag === tagName），则需要先闭合前一个标签。
      // 这是为了确保正确的标签嵌套关系，避免出现标签未闭合的情况。
      if (canBeLeftOpenTag(tagName) && lastTag === tagName) {
        parseEndTag(tagName)
      }
    }
    // 判断标签是否为自闭合标签 
    const unary = isUnaryTag(tagName) || !!unarySlash
    // 处理标签属性
    const l = match.attrs.length
    const attrs = new Array(l)
    for (let i = 0; i < l; i++) {
      const args = match.attrs[i]
      const value = args[3] || args[4] || args[5] || '' // 为什么是3/4/5可以见下面注释
      const shouldDecodeNewlines = tagName === 'a' && args[1] === 'href' ? options.shouldDecodeNewlinesForHref  : options.shouldDecodeNewlines
      attrs[i] = { name: args[1],   value: decodeAttr(value, shouldDecodeNewlines)  } // 对有的特殊属性值进行解码
      // 开发环境的配置可以忽略
      if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
        attrs[i].start = args.start + args[0].match(/^\s*/).length
        attrs[i].end = args.end
      }
    }
    // 不是自闭合标签，入栈
    if (!unary) {
      stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs, start: match.start, end: match.end })
      lastTag = tagName
    }
    // 执行开始标签处理逻辑
    if (options.start) {
      options.start(tagName, attrs, unary, match.start, match.end)
    }
  }

const value = args[3] || args[4] || args[5] || '' 为什么是3/4/5？

在属性的正则规则中，const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/

第一个括号([^\s"'<>\/=]+)：匹配一个或多个非空白字符、非引号（单引号或双引号）、小于号、大于号、正斜杠和等号。

第二个括号(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))：整个属性值的捕获，但是使用的是?:，捕获的结果不放到结果数组。

第三个括号(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+))：匹配属性值，但是使用的是?:，捕获的结果不放到结果数组。

第四个括号(=)：匹配=号

第五个括号([^"]*)：匹配单引号

第六个括号([^']*)'：匹配双引号

第七个括号([^\s"'=<>`]+)：匹配没有引号的属性值

所以该正则返回的结果数组依次是[匹配结果，第一个括号，第四个括号，第五个括号，第六个括号，第七个括号]

其中第五个括号，第六个括号，第七个括号在数组中的索引值就是3/4/5，也就是匹配了属性的value值

8.3 解析入参函数options.start

该函数除了一些特殊情况处理之外（像预执行指令），主要的核心逻辑是：将第一个检测的标签作为根节点绑定到root变量中；不是自闭合标签则入栈，等待解析到结束标签；是自闭合标签则结束该标签解析。

closeElement函数等到处理结束标签再详细说

这里的AST元素satck入栈

// 次要代码：校验规则或者兼容

      const ns = (currentParent && currentParent.ns) || platformGetTagNamespace(tag)// 获取标签名对应的命名空间，并赋值给变量 ns
      if (isIE && ns === 'svg') {...} // // 处理了 IE 浏览器中的 SVG bug
      if (isForbiddenTag(element) && !isServerRendering()) {...} // 处理了一些特殊情况，例如禁止的标签和 v-pre 指令

// 主要代码: 核心逻辑

      // 创建了一个 AST 元素（Abstract Syntax Tree，抽象语法树）对象，并传入标签名、属性列表和当前父节点。
      let element: ASTElement = createASTElement(tag, attrs, currentParent)
      // 如果 root 不存在，则将当前元素设置为根元素，并进行一些根元素的约束检查
      if (!root) {
        root = element
          if (process.env.NODE_ENV !== 'production') {checkRootConstraints(root)} // 忽略      }
      // 根据是否为自闭合标签（unary）来决定将当前元素设置为当前父节点或将当前元素出栈。
      if (!unary) {
        currentParent = element
        stack.push(element)
      } else {
        closeElement(element) // 结束对当前标签解析逻辑
      }

重点⑨处理<开头的结束标签

        // End tag:结束标签
        const endTagMatch = html.match(endTag) // 判断是否是结束标签
        if (endTagMatch) {
          const curIndex = index
          advance(endTagMatch[0].length) // 截取掉结束标签
          parseEndTag(endTagMatch[1], curIndex, index) 处理结束标签
          continue
        }

9.1parseEndTag函数

函数内部stack出栈，并处理配置项options的标签结束逻辑函数

主要代码
    if (tagName) {
      lowerCasedTagName = tagName.toLowerCase()
      // 检测当前标签在栈内是否存在，存在则以为这个标签已经解析到结束了，可以出栈了
      for (pos = stack.length - 1; pos >= 0; pos--) {
        if (stack[pos].lowerCasedTag === lowerCasedTagName) {
          break
        }
      }
    } else {
      // If no tag name is provided, clean shop
      pos = 0
    }
    if (pos >= 0) {
        ...
        // 处理结束标签逻辑
        for (let i = stack.length - 1; i >= pos; i--) {
            if (options.end) {
              options.end(stack[i].tag, start, end)
            }
        }
        // 出栈
        stack.length = pos
        lastTag = pos && stack[pos - 1].tag
    }

9.2配置项函数options.end

AST stack出栈，并结束该标签解析

    end (tag, start, end) {
      const element = stack[stack.length - 1]
      // pop stack 出栈
      stack.length -= 1
      // 拿取到栈内上一个元素作为父节点
      currentParent = stack[stack.length - 1]
      // 结束该元素解析
      closeElement(element)
    },

9.3closeElement

该函数核心的逻辑代码其实就是两句，其他的是处理v-if/v-else/slot-scope等特殊属性（这里不详细研究）

  // 当前节点添加到父节点的子节点列表中，并设置父子节点关系
        currentParent.children.push(element)
        element.parent = currentParent

重点⑩文本字符串处理

当检测到当前是文本字符串，则将字符串挂载到上一个标签的子元素中，这里的type3是vue识别是否是纯文本的标识。

        // 如果不是v-pre标签下面的文本，解析为表达式节点（type 为 2）
        // 如果是v-pre标签下面的文本，解析为文本节点（type 为 3）
        if (!inVPre && text !== ' ' && (res = parseText(text, delimiters))) {
          child = {
            type: 2,
            expression: res.expression,
            tokens: res.tokens,
            text
          }
        } else if (text !== ' ' || !children.length || children[children.length - 1].text !== ' ') {
          child = {
            type: 3,
            text
          }
        }
        // 将解析的文本挂在到父节点上
        if (child) {
          if (process.env.NODE_ENV !== 'production' && options.outputSourceRange) {
            child.start = start
            child.end = end
          }
          children.push(child)
        }

简单代码实现

const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z]*` // 标签名称，可以有_-.等字符组成
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)
const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`)  // 匹配标签结束
const startTagClose = /^\s*(\/?)>/ // 匹配标签闭合处>，>前面可以有空格
const dynamicArgAttribute = /^\s*((?:v-[\w-]+:|@|:|#)\[[^=]+?\][^\s"'<>\/=]*)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
let html = '<div id="app"><span disabled="false">{{message}}</span></div>'
const encodedAttr = /&(?:lt|gt|quot|amp|#39);/g
const encodedAttrWithNewLines = /&(?:lt|gt|quot|amp|#39|#10|#9);/g
const comment = /^<!\--/ // 匹配注释
const conditionalComment = /^<!\[/ // 条件注释
const isPlainTextElement = makeMap('script,style,textarea', true)
const reCache = {}
let root
let currentParent
const stackA = []

function parserHTML(html) {
  let index = 0
  let last, lastTag
  const stack = []
  // 循环处理
  while(html){
    last = html
    if(!lastTag || !isPlainTextElement(lastTag)) {
       // 解析标签or文本，判断html的第一个字符，是否为 < 尖角号
       let textEnd = html.indexOf('<');
      if(textEnd == 0){
        //  End tag:
        const endTagMatch = html.match(endTag)
        if (endTagMatch) {
          const curIndex = index
          advance(endTagMatch[0].length)
          parseEndTag(endTagMatch[1], curIndex, index)
          continue
        }
        // Start tag:
        const startTagMatch = parseStartTag()
        if (startTagMatch) {
          handleStartTag(startTagMatch)
          continue
        }
        // 如果<存在但是不是在起始位置
        console.log("是标签",startTagMatch)
      }  
      let text, rest, next
      if (textEnd >= 0) {
        rest = html.slice(textEnd)
        while (  !endTag.test(rest) && !startTagOpen.test(rest) && !comment.test(rest) &&  !conditionalComment.test(rest) ) {
          next = rest.indexOf('<', 1)
          if (next < 0) break
          textEnd += next
          rest = html.slice(textEnd)
        }
        text = html.substring(0, textEnd)
      }
       // 如果不存在标签<
       if (textEnd < 0) {
        text = html
      }

      if (text) {
        advance(text.length)
      }

      if ( text) {
        charFunc(text, index - text.length, index)
      }
    }
   else{
        let endTagLength = 0
        const stackedTag = lastTag.toLowerCase()
        const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\\s\\S]*?)(</' + stackedTag + '[^>]*>)', 'i'))
        const rest = html.replace(reStackedTag, function (all, text, endTag) {
          endTagLength = endTag.length
          if (!isPlainTextElement(stackedTag) && stackedTag !== 'noscript') {
            text = text
              .replace(/<!\--([\s\S]*?)-->/g, '$1') // #7298
              .replace(/<!\[CDATA\[([\s\S]*?)]]>/g, '$1')
          }
          // if (shouldIgnoreFirstNewline(stackedTag, text)) {
          //   text = text.slice(1)
          // }
          if (charFunc) {
            charFunc(text)
          }
          return ''
        })
        index += html.length - rest.length
        html = rest
        parseEndTag(stackedTag, index - endTagLength, index)
            // console.log("是文本")
        }
  }
  parseEndTag()

  // 截取字符串
  function advance (n) {
    index += n
    html = html.substring(n)
  }

  // 解析开始节点
  function parseStartTag () {
    const start = html.match(startTagOpen)
    if (start) {
      const match = {
        tagName: start[1],
        attrs: [],
        start: index
      }
      advance(start[0].length)
      let end, attr
      while (!(end = html.match(startTagClose)) && (attr = html.match(dynamicArgAttribute) || html.match(attribute))) {
        attr.start = index
        advance(attr[0].length)
        attr.end = index
        match.attrs.push(attr)
      }
      if (end) {
        match.unarySlash = end[1]
        advance(end[0].length)
        match.end = index
        return match
      }
    }
  }
  
  // 处理开始节点
  function handleStartTag(match) {
    const tagName = match.tagName
    const unary =  false
    const l = match.attrs.length
    const attrs = new Array(l)
    for (let i = 0; i < l; i++) {
      const args = match.attrs[i]
      const value = args[3] || args[4] || args[5] || ''
      attrs[i] = {  name: args[1],  value: decodeAttr(value, false) }
    }
    if (!unary) {
      stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs, start: match.start, end: match.end })
      lastTag = tagName
    }
    if (startFunc) {
      startFunc(tagName, attrs, unary, match.start, match.end)
    }
    console.log('lastTag====',lastTag)
  }

   // 解析结束节点
  function parseEndTag (tagName, start, end) {
    let pos, lowerCasedTagName
    if (start == null) start = index
    if (end == null) end = index
    if (tagName) {
      lowerCasedTagName = tagName.toLowerCase()
      for (pos = stack.length - 1; pos >= 0; pos--) {
        if (stack[pos].lowerCasedTag === lowerCasedTagName) {
          break
        }
      }
    } else {
      // If no tag name is provided, clean shop
      pos = 0
    }
    if (pos >= 0) {
      for (let i = stack.length - 1; i >= pos; i--) {
        if (endFunc) {
          endFunc(stack[i].tag, start, end)
        }
      }
      stack.length = pos
      lastTag = pos && stack[pos - 1].tag
    }
  }

  // 结束标签逻辑--对应源码中的入参end
  function endFunc(tag, start, end) {
    const element = stackA[stackA.length - 1]
    // pop stack
    stackA.length -= 1
    currentParent = stackA[stackA.length - 1]
    closeElement(element)
  }

   // 开始标签逻辑--对应源码中的入参start
  function startFunc(tag, attrs, unary, start, end) {
    let element = createASTElement(tag, attrs, currentParent)
     // 如果 root 不存在，则将当前元素设置为根元素，并进行一些根元素的约束检查
     if (!root) {
      root = element
    }
    // 根据是否为自闭合标签（unary）来决定将当前元素设置为当前父节点或将当前元素出栈。
    if (!unary) {
      currentParent = element
      stackA.push(element)
    } else {
      closeElement(element)
    }
  }

  // 处理字符串逻辑--对应源码中的入参chart
  function charFunc(text, start, end) {
    if (text) {
      let res
      let child
      const children = currentParent.children
      if (text !== ' ' ) {
        child = { type: 2,  text  }
      } else if (text !== ' ' || !children.length || children[children.length - 1].text !== ' ') {
        child = {   type: 3, text }
      }
      if (child) {
        children.push(child)
      }
    }
  }

  // 结束当前标签处理
  function closeElement (element) {
    if(currentParent) {
      currentParent.children.push(element)
      element.parent = currentParent
    }
  }

  // 处理的AST树结果
  console.log('====---root',root)
}

  // 其他一些工具函数
  function makeMap ( str, expectsLowerCase) {
    const map = Object.create(null)
    const list = str.split(',')
    for (let i = 0; i < list.length; i++) {
      map[list[i]] = true
    }
    return expectsLowerCase
      ? val => map[val.toLowerCase()]
      : val => map[val]
  }
  function decodeAttr (value, shouldDecodeNewlines) {
    const re = shouldDecodeNewlines ? encodedAttrWithNewLines : encodedAttr
    return value.replace(re, match => decodingMap[match])
  }
  function createASTElement ( tag, attrs, parent ) {
    return {
      type: 1,
      tag,
      attrsList: attrs,
      // attrsMap: makeAttrsMap(attrs),
      rawAttrsMap: {},
      parent,
      children: []
    }
  }



parserHTML(html)