前言

通过对 babel 的编译流程的书写，巩固自己对 babel 的理解，如果能带给你一点收获，那就更好啦～

基本介绍

babel 是一个转译器，它能把新语法写的代码转化为目标环境支持的语法环境，并且对目标环境不支持的 api 自动 polyfill。

babel 是巴别塔的意思。当时人类联合起来兴建希望能通往天堂的高塔，为了阻止人类的计划，上帝让人类说不同的语言，使人类相互之间不能沟通，计划因此失败，人类自此各散东西。此事件，为世上出现不同语言和种族提供解释。这座塔就是巴别塔。
- 来自《圣经》的典故

通过这个典故，转变到在前端的视角中，我们也不难发现 babel 有以下用途:

1. 对 esnext/typescript 转译，兼容目标浏览器

常见的有 @babel/preset-env、polyfill 等

2. 对一些特殊功能的代码转化

例如自动埋点、函数加参等。

3. 对代码进行分析

常见的有 eslint 或者混淆代码等

总体流程

babel 的编译总体流程主要分成三个阶段：

1 parse 阶段: 通过 @babel/parser 将代码转化为 AST
2 transform 阶段: 通过 @babel/traverse 对 AST 进行操作
3 generate 阶段: 通过 @babel/generator 将 AST 转化为源代码

parse 阶段

parse 阶段的目的是把源代码字符串转化成机器能够理解的 AST，这个过程分成词法分析、语法分析

比如

let name = 'feng';

这样一段源码，我们要先把它分成一个个不能细分的单词（token），也就是 let, name, =, 'feng'，这个过程是词法分析，按照单词的构成规则来拆分字符串成单词。

将 token 进行递归组装，生成 AST，这个过程称为语法分析，根据不同的语法结构，组装不同的对象，生成语法树

在这个阶段中，我们可以通过 @babel/parser 将源代码转化为 AST 语法树

如上图所示，parse的第一个参数 sourceCode 表示源代码，第二个参数代表配置项（sourceType 表示代码解析应采用的模式，传入 ‘unambiguous’ 表示自动判断代码解析格式。而plugins传入一个数组，表示要启动的插件）。具体可以查看 babel 文档 👉 传送门

最终编译出来的 AST 如下所示：

再此推荐一个网址 👉 astexplorer.net/，可以在线查看源代码转化为 AST 的结果。

transform 阶段

对 parse 阶段生成的 AST 进行遍历，针对不同的节点进行操作

在这个阶段中，我们可以通过 @babel/traverse 操作 AST

假设一个场景，想给项目中所有的 console.log 插入 3 个参数，这 3 个参数分别是当前的文件名、log对应的行和列。

这时候，我们可以先通过 @babel/parser 对源代码转化为 AST，通过 @babel/traverse 操作 AST，找到对应的log，并在log中插入文件名以及当前代码对应的行、列信息。

代码如下:

const traverse = require('@babel/traverse').default;
const types = require('@babel/types');

const targetCalleeName = ['log'].map(item => `console.${item}`);
traverse(ast, {
    CallExpression(path, state) {
        // const calleeName = generate(path.node.callee).code; // callee 是一个 MemberExpression 表达式
        const calleeName = path.get('callee').toString(); // arguments.callee 会指向函数本身
        if (targetCalleeName.includes(calleeName)) {
            const { line, column } = path.node.loc.start;
            path.node.arguments.unshift(types.stringLiteral(`filename: (${line}, ${column})`));
        }
    } 
})

通过在线编译网址 astexplorer.net/ 我们不难发现，console.log 在 AST 上表示为 CallExpression, 也就是表达式。

而 traverse 接收俩个参数，第一个是传入的 AST，第二个是 visitor，visitor包含了进入节点的方法（enter）、操作节点的方法、退出节点的方法（exit），例如上面例子中的 CallExpression，在遍历 AST 时，如果遇到了 CallExpression 节点，便会执行 visitor 对应的方法，如果匹配到是 console.log, 那么就会在 console.log 的参数中插入包含文件名以及对应行列的 AST 节点。

创建 AST 节点我们可以通过 @babel/types 提供的工具方法，例如 types.stringLiteral 可以创建字符串字面量。

当需要批量创建 AST 节点时，推荐使用 @babel/template。

genertate 阶段

将 AST 转化为源代码，并生成 source-map。

在这个阶段，我们可以通过 @babel/generate 将操作后的 AST 转化为源代码

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default;
const generate = require('@babel/generator').default;
const types = require('@babel/types');


const sourceCode = `
    console.log(1);

    function info() {
        console.info(2);
    }

    export default class People {
        say () {
            console.debug(3);
        }
        render () {
            return <div>{console.error(4)}</div>
        }
    }
`
const ast = parser.parse(sourceCode, {
    sourceType: 'unambiguous', // 不区分module还是script
    plugins: ['jsx']
});

// console.log(ast);
const targetCalleeName = ['log', 'info', 'error', 'debug'].map(item => `console.${item}`);
// traverse(ast, {
//     CallExpression(path, state) {
//         if (types.isMemberExpression(path.node.callee) 
//             && path.node.callee.object.name === 'console'
//             && ['log', 'info', 'error', 'debug'].includes(path.node.callee.property.name)) {
//                 const { line, column } = path.node.loc.start;
//                 path.node.arguments.unshift(types.stringLiteral(`filename: (${line}, ${column})`));
//             }
//     }
// })
traverse(ast, {
    CallExpression(path, state) {
        // const calleeName = generate(path.node.callee).code; // callee 是一个 MemberExpression 表达式
        const calleeName = path.get('callee').toString(); // arguments.callee 会指向函数本身
        if (targetCalleeName.includes(calleeName)) {
            const { line, column } = path.node.loc.start;
            path.node.arguments.unshift(types.stringLiteral(`filename: (${line}, ${column})`));
        }
    } 
})

const {code} = generate(ast);
console.log(code);

转化后的结果如下所示：

至此，我们在大体粗略了解了 babel 编译过程的同时，也实现了一个函数加参的效果。

常见的 AST

想要对 AST 进行操作，了解常见的 AST 节点类型是必不可少滴。

Literal - 字面量

比如 let name = 'feng'; 这里的 'feng' 就是字符串字面量

常见的字面量有:

babel 通过 xxLiteral 来抽象这部分内容

Identifier - 标识符

标识符是由字母数字下划线美元符号组成，常见的的有变量名、属性名、参数名等各种声明和引用的名字

Statement - 语句

可以独立执行的单位。语句是代码执行的最小单位。

Declaration - 声明语句

声明语句是一种特殊的语句，例如声明了一个变量、函数、class、import、export 等

Expression - 表达式

执行后会有返回值，也是和语句的区别之一

当表达式作为语句执行时，会在外层包裹 ExpressStatement

Class - 类

整个 class 的内容是 ClassBody。属性是 ClassProperty，方法是 ClassMethod，可以通过 kind 属性去区分是构造函数还是方法

import - 导入

有三种类型的 Specifier: importSpecifier、importDefaultSpecifier、importNamespaceSpecifier

可以看出在 importDeclaration(声明语句)下包含着各种 import Specifier

export - 导出

有三种类型导出声明语句: ExportNamedDeclaration、ExportDefaultDeclaration、ExportAllDeclaration

Program & Directive - 根节点 & 属性

program 是代表整个程序的节点，它有 body 属性代表程序体，存放 statement(语句)数组，就是具体执行的语句的集合。还有 directives 属性，存放 Directive 节点，比如"use strict" 这种指令会使用 Directive 节点表示。

File - 最外层节点

babel 的 AST 最外层节点是 File，它有 program、comments、tokens 等属性，分别存放 Program 程序体、注释、token 等，是最外层节点。

实现一个简易的 babel

通过前面的学习，我们不仅大体了解了 babel 的编译过程也了解 AST 节点的相关知识，接下来我们将实现一个简易的 babel ，加深对 babel 的理解。

实现一个 parser

如何使用 @babel/parser

通过官方文档，我们可以了解到，parser 接收俩个参数，第一个参数表示源代码，第二个参数表示解析的配置，例如：

sourceType: 以什么方式进行解析
plugins: 需要额外加载的插件

已知条件

babel 的 parser 并不是完全自己实现的，而是基于 acorn 并通过继承插件的方式进行拓展, 因此如果我们想实现一个 parser 插件只需要继承 acorn 的 Parser 类，并根据需要进行拓展即可。如下实现一个 parser 插件：

思路

1 获取到用户传入的配置项，根据配置项去加载对应的 parser 插件
2 通过继承的方式对 acorn 的 parser 进行拓展

/**
 * babel 的 parser 是基于 acorn 进行拓展的， 例如 Literal 拓展成 StringLiteral、NumberLiteral 等。
 * 同时实现了 jsx、typescript、flow 等语法插件的拓展
 */
const acorn = require("acorn");

const syntaxPlugins = {
    'literal': require('./plugins/literal'),
    'cvteKeyword': require('./plugins/cvteKeyword')
}

const defaultOptions = {};

/**
 * babel 中的 parser 通过继承的方式去拓展插件
 * @param {*} code 
 * @param {*} options 
 * @returns 
 */
function parse(code, options) {
    const resolvedOptions = Object.assign({}, defaultOptions, options);
    
    const newParser = resolvedOptions.plugins && resolvedOptions.plugins.reduce((Parser, pluginName) => {
        let plugin = syntaxPlugins[pluginName];
        return plugin ? Parser.extend(plugin) : plugin; // Parser.extend 内部会调用 plugin 方法并传入 Parser
    }, acorn.Parser) || acorn.Parser;

    return newParser.parse(code, {
        locations: true, // 保留 AST 源码中的位置信息，生成 sourcemap 的时候会用 （源码位置信息会存储在 loc 属性上）
    })
}

module.exports = {
    parse
}

实现一个 traverse

如何使用 @babel/traverse

通过下图可知，traverse方法接受俩个参数，第一个参数是 AST, 在深度遍历 AST 时，会判断节点的类型是否有匹配的函数，有则执行对应的回调函数

已知条件

traverse 是遍历 AST，并且在遍历的过程中判断节点的类型是否有对应的函数，在函数里面实现对 AST 的增删改。

思路

1 对 AST 进行深度优先遍历
2 在遍历的过程中，根据类型调用不同的 visitor 函数

在遍历的过程中，我们还需要判断当前节点下的哪些子节点可以继续遍历，因此我们需要借助一个 Map 帮助我们记录哪些节点的子节点是需要继续遍历的

知道了哪些节点需要遍历，我们就可以对 AST 进行深度优先遍历，并且如传入的函数有匹配的节点，则执行函数。

/**
 *
 * example:
    traverse(ast, {
        enter: xxx,
        Identifier(node) {
            node.name = 'b';
        },
        exit: xxx,
    });
 * @param {*} node - AST节点
 * @param {*} visitor - 用户自定义的visitor对象
 * @param {*} parent - 父节点 用来记录节点对应的路径
 * @param {*} parentPath - 路径
 * @param {*} key - 当前节点对应到父节点的属性
 * @param {*} listKey - 如果父节点的属性是数组，listKey对应下标
 */
const traverse = function(node, visitor, parent, parentPath, key, listKey) {
    const defination = visitorKeys.get(node.type);

    let visitorFn = visitor && visitor[node.type];
    const path = new NodePath(node, parent, parentPath, key, listKey);

    visitor.enter && visitor.enter(path);


    if (typeof visitorFn === 'function') {
        visitorFn(path);
    }

    if (node.__shouldSkip) {
        delete node.__shouldSkip;
        return;
    }
    
    if (defination && defination.visitor) {
        defination.visitor.forEach(key => {
            const prop = node[key];
            if (Array.isArray(prop)) {
                prop.forEach((childNode, childIndex) => {
                    traverse(childNode, visitor, node, path, key, childIndex);
                })
            } else {
                traverse(prop, visitor, node, path, key);
            }
        })
    }

    visitor.exit && visitor.exit(path);
}

在执行对应的回调函数时，会传入俩个参数 path 和 state。 path 不仅存储了各个节点的关联关系，还有一些工具方法，例如 path.isIdentifier 可以判断节点是否是标识符、对节点进行删除、替换、跳过节点的遍历以及查找父节点等，因此我们需要实现一个 path。

实现 path 的思路:

前提: path 可以简单的理解是链表，是在深度优先遍历的过程中，不断地去关联父子path

做法: 在遍历的过程中，创建一个节点 NodePath, 不断地去关联path
代码如下:

const types = require('../types');
const Scope = require('./Scope');

/**
 * 每个节点对应的 path
 */
module.exports = class NodePath {
    constructor(node, parent, parentPath, key, listKey) {
        this.node = node;
        this.parent = parent; // node 节点
        this.parentPath = parentPath; // NodePath
        this.key = key;
        this.listKey = listKey;

        Object.keys(types).forEach(key => { // 给 NodePath 绑定 节点类型的判断函数
            if (key.startsWith('is')) {
                this[key] = types[key].bind(this, node);
            }
        })
    }

    // 需要用到的时候采取获取 scope
    get scope () {
        if (this._scope) {
            return this._scope;
        }

        const isBlock = this.isBlock();
        const parentScope = this.parentPath && this.parentPath.scope;
        this._scope = isBlock ? new Scope(parentScope, this) : parentScope;
        return this._scope;
    }
    
    replaceWith(node) {
        if (this.listKey !== undefined) {
            this.parent[this.key].splice(this.listKey, 1, node);
        } else {
            this.parent[this.listKey] = node;
        }
    }
    remove() {
        if (this.listKey !== undefined) {
            this.parent[this.key].splice(this.listKey, 1);
        } else {
            this.parent[this.listKey] = null;
        }
    }
    findParent(callback) {
        let curPath = this.parentPath;
        while(curPath && !callback(curPath)) {
            curPath = curPath.parentPath;
        }
        return curPath;
    }
    find(callback) {
        let curPath = this;
        while(curPath && !callback(curPath)) {
            curPath = curPath.parentPath
        }
        return curPath;
    }
    skip() {
        this.node._shouldSkip = true; // 给节点设置字段，如有标记则跳过子节点遍历
    }
    traverse (visitor) {
        const traverse = require('../index');
        const defination = types.visitorKeys.get(this.node.type); // 判断

        if (defination.visitor) {
            defination.visitor.forEach(key => {
                const prop = this.node[key];
                if (Array.isArray(prop)) { // 例如参数名有可能是一个数组
                    prop.forEach((childNode, childIndex) => {
                        traverse(childNode, visitor, this.node, this, key, childIndex)
                    })
                } else {
                    traverse(prop, visitor, this.node, this, key)
                }
            })
        }
    }
    /**
     * 判断是否是能生成作用域的节点
     */
    isBlock() {
        return types.visitorKeys.get(this.node.type)?.isBlock;
    }
}

path下有个属性 scope，path.scope 记录着整条作用域链，包括声明的变量和对该声明的引用

能生成 scope 的 AST 叫做 block，比如 FunctionDeclaration 就是一个 block

scope 中记录着 bindings, 也就是声明，每个声明会记录在哪儿声明的，哪里引用的

实现 scope 的思路: 做法: 遇到 block 节点，创建 scope 的时候，遍历作用域中的所有声明(VariableDeclaration、FunctionDeclaration)，记录该 binding 到 scope 中。
代码如下:

// scope 中记录着 binding, 也就是声明，每个声明会记录在哪儿声明的，哪里引用的
class Binding {
    constructor(id, path) {
        this.id = id;
        this.path = path;
        this.referenced = false; // 判断该声明是否有被引用，并在哪儿被引用
        this.referencePaths = [];
    }
}

// Scope 代表作用域，作用域内会有多个声明和引用全部存储在 binding 中
class Scope {
    constructor(parentScope, path) {
        this.parent = parentScope;
        this.binding = {};
        this.path = path;
      
        // 注册 binding 设置作用域
        path.traverse({
            VariableDeclarator: (childPath) => {
                this.registerBinding(childPath.node.id.name, childPath);
            },
            FunctionDeclaration: (childPath) => {
                this.registerBinding(childPath.node.id.name, childPath);
                childPath.skip();
            }
        })

        path.traverse({
            Identifier: childPath => {
                if (!childPath.findParent(p => p.isVariableDeclarator())) {
                    const id = childPath.node.name;
                    const bindings = this.getBinding(id);
                    if (bindings) { // 找到了对应的作用域
                        bindings.referenced = true; // 该声明有被引用
                        bindings.referencePaths.push(childPath); // 引用的路径
                    }
                }
            }
        })
    }

    registerBinding(id, path) {
        this.binding[id] = new Binding(id, path);
    }

    getOwnBinding(id) {
        return this.binding[id];
    }
    
    getBinding(id) {
        let res = this.getOwnBinding(id);
        if (res === undefined && this.parent) {
            res = this.parent.getOwnBinding(id);
        }
        return res;
    }
    hasBinding(id) {
        return !!this.getBinding(id);
    }
}

module.exports = Scope;

这里我们也可以大概了解了 tree-shaking 的原理，在 AST 遍历的时候，遇到 VariableDeclarator 节点会去存储哪些变量是声明好的，当遇到 Identifier 节点的时候，会找到对应的声明并标记已使用，而未被标记已使用的变量声明将会被移除掉。

实现一个 generator

generator 可以将 AST 转化为代码，并生成source-map。

思路

1 根据不同的 AST 类型输出不同的字符串
2 在输出的过程中记录目标代码的行和列，通过目标行列和源代码的行列关联关系生成source-map

const { SourceMapGenerator } = require('source-map');

// Printer 类打印每种 AST 的打印逻辑
class Printer {
    constructor(source, fileName) {
        this.buf = '';
        this.printLine = 1; // 打印第几行
        this.printColumn = 0; // 第几列

        this.sourceMapGenerator = new SourceMapGenerator({
            file: fileName + ".map.json"
        });

        this.fileName = fileName;
        this.sourceMapGenerator.setSourceContent(fileName, source); // sourcemap 需要指定源文件名
    }
    /**
     * 将源代码转化为 AST 时, 在 AST 中的节点会记录源代码的行/列
     * 
     * 通过 ast 打印成目标代码的时候，会记录相对应的 目标代码的行列
     * 
     * 这样的话就会有目标代码行列和源代码的行列的映射关系
     * @param {*} node
     * @memberof Printer
     */
    addMapping(node) {
        if (node.loc) { // 可以从
            this.sourceMapGenerator.addMapping({
                generated: { // 目标节点和源代码行列的映射
                    line: this.printLine,
                    column: this.printColumn
                },
                source: this.fileName,
                original: node.loc && node.loc.start
            })
        }
    }

    space() {
        this.buf += ' ';
        this.printColumn++;
    }

    nextLine() {
        this.buf += '\n';
        this.printLine++;
        this.printColumn = 0;
    }

    Program(node) {
        this.addMapping(node);
        node.body.forEach(item => {
            this[item.type](item) + ';';
            this.printColumn++;
            this.nextLine();
        });
    }

    ExpressionStatement(node) {
        this.addMapping(node);

        this[node.expression.type](node.expression);

    }

    VariableDeclaration(node) {
        if (!node.declarations.length) {
            return;
        }

        this.addMapping(node);
        
        this.buf += node.kind; // let/const/var
        this.space();
        node.declarations.forEach((declaration, index) => {
            if (index != 0) {
                this.buf += ',';
                this.printColumn++;
            }
            this[declaration.type](declaration);
        });
        this.buf += ';';
        this.printColumn++;
    }

    VariableDeclarator(node) {
        this.addMapping(node);
        this[node.id.type](node.id); // 标识符
        this.buf += '=';
        this.printColumn++;
        this[node.init.type](node.init); // 字面量
    }

    Identifier(node) {
        this.addMapping(node);
        this.buf += node.name;
    }

    FunctionDeclaration(node) {
        this.addMapping(node);

        this.buf += `function ${node.id.name}(${node.params.map(item => item.name).join(',')}){`;
        this.nextLine();
        this[node.body.type](node.body);
        this.buf += '}';
        this.nextLine();
    }

    // 函数调用
    CallExpression(node) {
        this.addMapping(node);
        this[node.callee.type](node.callee);
        this.buf += '(';
        node.arguments.forEach((item, index) => {
            if (index > 0) this.buf += ', ';
            this[item.type](item);
        })
        this.buf += ')';
    }

    // 表达式语句
    ExpressStatement(node) {
        this.addMapping(node);
        this[node.expression.type](node.expression);
    }

    // return 语句
    ReturnStatement(node) {
        this.addMapping(node);
        this.buf += 'return ';
        node.argument && this[node.argument.type](node.argument);
    }

    BlockStatement(node) {
        this.addMapping(node);
        
        node.body.forEach(item => {
            this.buf += '    ';
            this.printColumn += 4;
            this[item.type](item);
            this.nextLine();
        })
    }

    BinaryExpression(node) {
        this.addMapping(node);
        this[node.left.type](node.left);
        this.buf += node.operator;
        this[node.right.type](node.right);
    }

    NumericLiteral(node) {
        this.addMapping(node);
        this.buf += node.value;
    }

    MemberExpression(node) {
        this.addMapping(node);
        this[node.object.type](node.object);
        this.buf += '.';
        this[node.property.type](node.property);
    }
}

class Generator extends Printer {
    constructor(source, fileName) {
        super(source, fileName);
    }

    generate(node) {
        this[node.type](node); // 疯狂递归

        return {
            code: this.buf,
            map: this.sourceMapGenerator.toString(), // 生成 source-map
        }
    }
}

function generate(node, source, fileName) {
    return new Generator(source, fileName).generate(node);
}

module.exports = generate;

实现一个 core 包

@babel/core 是 parser、traverse、generate 三者的集大成者，了解到在 babel 中是先执行 plugin 在执行 preset，plugin 从左到右执行，而 preset 从右到左执行。

const types = require('@babel/types');
const parser = require('../parser');
const traverse = require('../traverse');
const generate = require('../generator');
const template = require('@babel/template').default;

/**
 * 实现一个 transformSync
 * 注意: 
 *   1 先执行 plugin 后执行 preset
 *   2 plugin 从前往后执行
 *   3 preset 从后往前执行
 * @param {*} code 
 * @param {*} options 
 */
function transformSync(code, options) {
    const ast = parser.parse(code, options.parserOpts); // parser 是通过继承的方式拓展

    const pluginAPI = {
        types,
        template
    }

    const visitors = {};

    options.plugins && options.plugins.forEach(([plugin, options]) => {
        const res = plugin(pluginAPI, options);
        Object.assign(visitors, res.visitor); // 简单合并
    });

    options.presets && options.presets.reverse().forEach(([plugin, options]) => {
        const res = plugin(pluginAPI, options);
        Object.assign(visitors, res.visitor);
    });

    traverse(ast, visitors);

    return generate(ast, code, options.fileName);
}

module.exports = {
    transformSync
}

从plugin的实现我们也不难发现，实现一个babel插件其实就是一个函数，这个函数的第一个参数是一系列的工具类，如@babel/types 或者 @babel/template，第二个参数是插件设置的参数。例如先前函数插入参数的例子中，如果有 babel 插件的形式书写如下：

总结

通过对 babel 整体编译过程的书写，巩固了对 babel 的理解，我想在未来，babel会结合图像识别、语音识别等技术解放程序员双手，更加提高开发效率。