初识Webpack：从入口文件到Bundle输出

对于大部分初入前端的开发者来说，webpack就像一个黑盒，不知道从输入到输出之间经历了一个怎样的过程，直接一头扎进去学习源码容易被各种未知代码干扰，不是一个好的选择。最佳的选择是先通过demo学习webpack的打包原理过程，再带着自己的启发和疑问去阅读源码，才会事半功倍，在面试时方能与面试官侃侃而谈

流程简介：

从入口文件开始，递归遍历模块之间的依赖，并将每个模块转化为 AST，最后获得入口文件的所有依赖模块
遍历每个模块的 AST 进行处理
将处理后的 AST 转化为函数字符串，写入到bundle文件

示例搭建

主要A、B、C、D四个文件，A文件引入 B、C文件的函数调用，输出结果，C文件引入了D文件的函数输出结果，文件之间的依赖关系图如下：

在开始构造示例之前，提前剧透下最终打包输出的bundle文件内容格式，方便理解后面的代码。可以看到最后bundle输出的是个立即执行的函数，参数是入口文件依赖的模块数组，每个元素是模块对应转化后的函数

(function(){
    
})
/*参数是依赖模块 */
([
    // 模块对应的函数
    (function(module, _ourRequire){
        
    })
])

主流程

import fs from "fs";
import crypto from "crypto";
import { depsGraph } from "./deps_graph.mjs";
import { transform } from "./transform.mjs";

// 1. 遍历依赖图谱
const entry = "./A.mjs"; // 入口文件
const depsArray = depsGraph(entry);
console.log('依赖数组:', depsArray);
// depsArray.forEach(item => {
//   console.log(JSON.stringify(item, null, 2))
// })

// 2. 转换成 bundle 字符串内容
const vendorString = transform(depsArray);

// 3. 写入 bundle 文件 和 manifest 文件中
// 计算hash值
const sum = crypto.createHash("md5");
sum.update(vendorString);
const hash = sum.digest("hex");
// 写入bundle文件
fs.writeFileSync(`./build/bundle-${hash}.js`, vendorString, "utf8");
// 写入manifest文件
fs.writeFileSync(
  "./build/manifest.json",
  `{"bundle": "bundle-${hash}.js"}`,
  "utf8"
);

遍历&编译依赖图谱

/**
 * 从入口文件开始，递归遍历模块并进行编译保存
 * @param file 入口文件地址
 */
const depsGraph = (file) => {
  const fullPath = path.resolve("./src/", file);

  // 如果该模块已保存，则退出
  if (!!depsArray.find((item) => item.name === fullPath)) return;

  // 读取该模块的文件内容转为 AST，并将该模块的路径和AST保存
  const fileContents = fs.readFileSync(fullPath, "utf8");
  const source = ast.parse(fileContents);
  const module = {
    name: fullPath,
    source,
  };
  depsArray.push(module);

  // 判断当前模块是否有依赖其他模块：有则以当前模块文件为入口文件开始递归遍历
  source.body.map((current) => {
    // 含有import声明（表示当前模块依赖其他模块）
    if (current.type === "ImportDeclaration") {
      // process module for each dep.
      depsGraph(current.source.value);
    }
  });

  return depsArray;
};

depsArray：保存入口文件所有依赖的模块，每个模块由name和source两个属性组成：
- name：表示该模块文件的地址
- source：该模块转化成的AST。可以通过工具，将模块文件的代码复制进去，查看对应生成的AST，以C文件为例
  - 其中 ImportDeclaration 表示这是个import语句，ExportNamedDeclaration 表示这是个export导出语句（后面有用）

depsGraph 函数的参数是模块文件的地址，刚开始传入的是入口文件的地址，通过Node的 path 库获得入口文件的完整文件地址 fullPath, 遍历模块数组通过当前文件的路径来判断当前文件是否被编译处理，如果存在表示当前文件已经被编译处理，如果不存在，则开始处理
通过 fs 库读取当前文件的内容，传到从 abstract-syntax-tree 导入的ast，解析生成当前文件对应的 AST 即 source变量
生成对象，name key对应的值为当前文件的完成路径为name，source key对应的值为当前文件的 AST，将对象添加进模块依赖数组
当前模块文件已经被编译处理后，还要去判断当前模块文件是否依赖其他模块文件，如果有依赖其他模块文件，则递归调用 depGraph 函数将其他模块文件编译并添加进模块数组
这里有个问题就是怎么判断当前模块是否依赖其他模块呢？这就要从AST说起，遍历AST的body，从上面的AST图可以看出如果类型为ImportDeclaration表示这是个import语句，依赖其他文件，可以拿到对应依赖的文件路径
最后我们可以看下，输出处理后的 depsArray看看是什么样（如果想要看的更详细，可以遍历数组通过JSON.stringify打印各个元素查看更深层次的对象）:

遍历依赖树处理

这一步骤主要是遍历依赖树，将各个模块的AST进行增删处理，最后返回最终的bundle文件里的代码字符串

/**
 * @param 入口文件的依赖模块 
 * @returns bundle文件的字符串内容
 */
const transform = (depsArray) => {
  // 遍历模块处理
  const updatedModules = depsArray.reduce((acc, dependency, index) => {
    /**
     * 遍历当前模块的 AST，对import 和 exoort 进行替换处理
     */
    const updatedAst = dependency.source.body.map((item) => {
      if (item.type === "ImportDeclaration") {
        // 将 import 替换为处理后的import语句
        item = getImport(item, depsArray);
      }
      if (item.type === "ExportNamedDeclaration") {
        // export替换为处理后的export语句
        item = getExport(item);
      }
      return item;
    });
    // 将当前模块的AST body 替换为处理后的 body
    dependency.source.body = updatedAst;

    // 将AST重新转换回代码
    const updatedSource = ast.generate(dependency.source);

    // 将各个模块的代码转化为模板字符串
    const updatedTemplate = buildModuleTemplateString(updatedSource, index);
    acc.push(updatedTemplate);
    return acc;
  }, []);

  // 将所有模块的模板字符串通过','拼接合成最终bundle模板字符串
  const bundleString = buildRuntimeTemplateString(updatedModules.join(","));

  return bundleString;
};

首先遍历模块数组，对每个模块的AST的body（下图红色数组部分）进行处理：

1.1 如果body中当前的语句是import 语句，则将该语句替换为处理后的import语句

/**
 * 将ESM的import转化为我们自己的函数字符串！是字符串！！！
 * `const importSome = _ourRequire({ID})`(参数id是依赖模块在模块数组中的index下标)
 */
const getImport = (item, allDeps) => {
  // 获取导入的变量名称
  const importFunctionName = item.specifiers[0].imported.name;
  // 获取导入的文件路径，处理后获得完整路径
  const fullFile = path.resolve("./src/", item.source.value);
  // 遍历依赖数组：根据import的文件完整路径，找出这个依赖的文件模块在模块依赖数组中的index下标
  const itemId = allDeps.findIndex((item) => item.name === fullFile);
  // 返回替换后对应的 AST 语句
  return {
    type: "VariableDeclaration",
    kind: "const",
    declarations: [
      {
        type: "VariableDeclarator",
        init: {
          type: "CallExpression",
          callee: {
            type: "Identifier",
            name: "_ourRequire",
          },
          arguments: [
            {
              type: "Literal",
              value: itemId,
            },
          ],
        },
        id: {
          type: "Identifier",
          name: importFunctionName,
        },
      },
    ],
  };
};

item 是 AST body 数组的一个元素，allDeps 是入口文件的依赖模块数组。先从AST中获得import的变量名称importFunctionName 和import的路径fullFile，合成import依赖的文件完整路径，拿这个完整路径去遍历模块依赖数组，通过路径判断（前面说过模块数组的元素由name和source两个key组成，name表示这个模块文件的完整路径）这个依赖的模块在模块数组中的下标itemId，然后修改这个item 的AST语法，将原来代码的import a from 'b'语句替换为const x = _ourRequire(itemId)。(webpack里函数名叫_webpack_require_)

可以看到C文件转换前的import { funcD } from "./D.mjs" 转换后变成 const funcD = _ourRequire(3)，其中3表示模块文件D在依赖模块数组depsArray中的下标。所以getImport函数主要是完成了import语句的替换

1.2 同理 getExport函数完成了export语句的替换

2、进行了import和export的处理后，将当前的AST进行更新，并通过AST的generate方法，将AST重新转换为代码

从替换impact和export语句这里是否有啥启发？如果想实现一个在打包时删除console.log的loader，是不是在这里遍历AST的时候进行操作将console.log替换成空就可以了呢？

3、将各个模块的代码通过buildModuleTemplateString函数转换为模板字符串

/*
 * 模块的代码使用函数包裹，使用严格模式
 */
const buildModuleTemplateString = (moduleCode, index) => `
/* index/id ${index} */
(function(module, _ourRequire) {
  "use strict";
  ${moduleCode}
})
`;

其中moduleCode表示模块从修改后的AST转换回来的代码，使用严格模式被嵌在函数里，将函数字符串返回。这里返回的函数字符串就是最终bundle文件里立即执行函数的参数数组的元素

4、最终updatedModules变量收集了各个模块代码运行的函数字符串，这些函数字符串数组调用数组的join方法，通过,拼接成字符串，传入buildRuntimeTemplateString来生成最终的bundle文件的代码

// 生成bundle文件内容的代码
const buildRuntimeTemplateString = (allModules) => `
(function(modules) {
  const installedModules = {};
  function _our_require_(moduleId) {
    if (installedModules[moduleId]) {
       return installedModules[moduleId].exports
    }

    const module = {
       i: moduleId,
       exports: {},
    }

    modules[moduleId](module, _our_require_);

    const exports = module.exports;
    installedModules[moduleId] = exports

    return exports;
  }

  return _our_require_(0);

})
([
 ${allModules}
]); 
`;

可以看到allModules最终被放在函数参数的数组里，这么看可能不太好理解，所以下面直接放出最后生成的bundle文件里的代码


(function(modules) {
  // 已经加载过的模块，key为模块在模块数组中的下标，value为该模块的export导出值
  const installedModules = {};
  // 模块加载函数
  function _our_require_(moduleId) {
    if (installedModules[moduleId]) {
       return installedModules[moduleId].exports
    }

    const module = {
       i: moduleId, // moduleId 是该模块在模块数字中的下标
       exports: {}, // 保存该模块的导出变量
    }

    // 执行该模块
    // import被替换成_ourRequire语句,_ourRequire对应这里传进去的_our_required_函数
   // export被替换成module.export = 变量
    modules[moduleId](module, _our_require_);

    // 将该模块的export值保存
    const exports = module.exports;
    installedModules[moduleId] = exports

    // 将该模块的export值返回
    return exports;
  }

  // 从入口文件开始加载模块
  // 返回的是入口文件的export值
  return _our_require_(0);

})
([
 // 下面这些值就是各个模块的运行的代码函数
/* index/id 0 */
(function(module, _ourRequire) {
  "use strict";
  const funcB = _ourRequire(1);
const funcC = _ourRequire(2);
const main = () => {
  console.log("这是是 A 文件:");
  console.log("输出导入 B 文件的函数调用:", funcB());
  console.log("输出导入 C 文件的函数调用结果", funcC());
};
main();

})
,
/* index/id 1 */
(function(module, _ourRequire) {
  "use strict";
  const funcB = () => {
  return "B文件函数调用结果";
};
module.exports = funcB;

})
,
/* index/id 2 */
(function(module, _ourRequire) {
  "use strict";
  const funcD = _ourRequire(3);
const funcC = () => {
  return `C 文件结果 & ${funcD()}`;
};
module.exports = funcC;

})
,
/* index/id 3 */
(function(module, _ourRequire) {
  "use strict";
  const funcD = () => {
  return "D 文件函数调用结果";
};
module.exports = funcD;

})

]);

这个立即执行函数的参数是个数组，数组的每个元素是个函数，接收两个参数module（当前模块执行时会将导出值存在module对象里）和_ourRequire（用来导入当前模块依赖的模块），函数里使用严格模式，函数里接下来的代码就该模块的执行代码

立即执行函数里，installedModules对象的key表示模块在依赖数组里的下标index, value则是当前模块的export导出值；

_our_require_是模块加载函数，参数moduleId就是加载模块在模块数组的下标

1、首先判断判断模块是否加载过，如果加载过，则将当前模块的export导出值直接返回；否则继续执行

2、构建module对象，属性i表示当前模块id，属性exports表示当前模块的导出值，初始化为空对象{}

3、调用该模块代码函数，就是立即调用函数的参数数组里的函数元素，将module对象和模块加载函数_our_required_函数传入，在模块的执行函数里，如果有依赖其他模块，则会调用_our_required_函数加载其他模块；如果当前模块有export导出变量，这导出的变量会被更新在module的exports属性上，加载完后module对象的exports属性就包含了当前模块的导出值

立即执行函数将会从下标为0的module（即入口文件所在的模块）开始进行模块加载，最后将入口文件的export导出值进行返回。

这就是最终的bundle文件。当然，这是个demo，在不具体深入webpack的真实源码情况下，动手代码简单还原了bundle输出过程，帮助初学者深入理解。具体实现起来都有对应的第三方库可以方便调用。

初识Webpack：从入口文件到Bundle输出

初识Webpack：从入口文件到Bundle输出

流程简介：

示例搭建

主流程

遍历&编译依赖图谱

遍历依赖树处理

关注公众号《小前日记 》获取源码

关注公众号《小前日记》获取源码