前端编码"三剑客"（一）Base64base64为什么可以作为URL传输的编码方式之一？base64底层是如何实现的？有

《前端编码系列》

前端编码“三剑客”（零）前置基础

前端编码"三剑客"（一）Base64

前端编码”三剑客“（二）URI

Base64 编码后的字符串只包含 ASCII字符，因此可以安全地传输或存储到不支持二进制数据的地方。缺点是编码后体积平均会扩大 33%。

Base64 编码常用于 URL query参数值的编解码，是一种安全的query参数编码方式。它还有许多其它应用场景，但本文内容主要围绕它作为query参数编码方式展开。

为什么用base64编码

urisafe 的 base64编码完全不包含 uri 保留字符，都是 ASCII 字符，这意味着：使用 urisafe base64 字符串传输 URI 参数值对 URL 链接是完全无害的！ ，而且urisafe base64字符串对 URI 编解码和 URIComponent 编解码而言是幂等的，也即不管执行多少次 URI/URIComponent 编解码都不会改变字符串的内容！ 因此它可以作为一种安全的URL查询参数编码方式。
base64 编码可以避免在 URL 中直接使用某些特殊字符，一定程度上保证了 URL 安全
base64 可以将二进制数据转换成纯文本格式，使得二进制数据可以在不兼容二进制的场景下储存和传输
base64 是一种广泛接受的标准，通用性强

编码原理

从 ASCII 字符集中选取64个可打印字符，作为基本字符集对其它字符进行编码转换。另外用=作为填充字符。

另外，base64编码中包含+和/，这2个属于 URI 的保留字符，在 URL 中可能会造成歧义。因此有一种变体的 base64 编码，使用-代替+，使用_代替/，称为 urisafe 的 base64 编码。

base64标准编码索引表

【编码过程】

将字符串（的二进制字节流）每3个字节划分为一组，每组共24个二进制位（24 bit）
将上面的24个二进制位划分为每6个一组，共4组
在每组前面加2个0，组成8个二进制位。如果长度不够8，就在最后面继续补0
根据 base64 编码索引表，获取对应的字符，形成base64编码、注意：如果编码出来之后的字符长度不能满足为4的倍数，需要用=补齐为4的倍数。因为base64解码是以4位字符为一组划分进行解析的，如果不满足字符长度为4的倍数的话就会导致解析失败！

体积平均扩大33%是怎么来的？
编码前：3*8 = 24bit
编码后：4 * (6+2) = 32bit，（32 - 24）/ 24 = 0.33

下面以hi为例，演示 base64 编码过程

文本	h	i
ASCII（十进制）	104	105
ASCII（二进制）	01101000	01101001
分组补0	00011010	00000110	00100100
索引（十进制）	26	6	36
对应字符	a	G	k	=

对于非 ASCII 字符，先转换为 UTF-8 格式，然后按照上述过程进行 base64 编码

字符	你			好
ASCII（十六进制）	E4	BD	A0	E5	A5	BD
ASCII（二进制）	11100100	10111101	10100000	11100101	10100101	10111101
分组补0	00111001	00001011	00110110	00100000	00111001	00011010	00010110	00111101
索引（十进制）	57	11	54	32	57	26	22	61
对应字符	5	L	2	g	5	a	W	9

【解码过程】

以4个字符为1组，将 base64 字符串进行拆分
针对每个分组，拆分每个字符，并将字符还原回索引值
将索引值转换成二进制的形式，添加2个前缀0；如果不够8位，在前缀0后面补 0 达到8位
针对每个索引值，去掉2个前缀0，然后将剩下的字符连成1串，接着以8位一组进行划分，进一步的，可以转换成十六进制
将每个十六进制按照 UTF-8 的格式解码为字符

Base64	S	G	V	s	b	G	8	=
索引（十进制）	18	6	21	44	27	6	60	-
分组补0	00010010	00000110	00010101	00101100	00011011	00000110	00111100
ASCII（二进制）	01001000	01100101	01101100	01101100	01101111
ASCII（十六进制）	48	65	6c	6c	6f
字符	H	e	l	l	o

atob() 和 btoa()

JavaScript内置的分别用于base64解码 / 编码的 api。需要注意的是：这2个 api 只能处理 ASCII 字符

console.log(btoa("hello world")); // 输出：aGVsbG8gd29ybGQ=
console.log(atob("aGVsbG8gd29ybGQ="));  // 输出：hello world
console.log(btoa("中")); //抛出异常

Js-base64 库

JavaScript 中如何快捷地进行 base64 编、解码？答：可以使用第三方库 js-base64 npm

语法

/**
 * converts a UTF-8-encoded string to a Base64 string.
 * @param {boolean} [urlsafe] if `true` make the result URL-safe
 * @returns {string} Base64 string
 */
declare const encode: (src: string, urlsafe?: boolean) => string;

/**
 * converts a UTF-8-encoded string to URL-safe Base64 RFC4648 §5.
 * @returns {string} Base64 string
 */
declare const encodeURI: (src: string) => string;

/**
 * converts a Base64 string to a UTF-8 string.
 * @param {String} src Base64 string.  Both normal and URL-safe are supported
 * @returns {string} UTF-8 string
 */
declare const decode: (src: string) => string;

示例

import { Base64 } from 'js-base64';

const utf8 = "西快dankogai";
Base64.encode(utf8);        //6KW/5b+rZGFua29nYWk=
Base64.encode(utf8, true);  //6KW_5b-rZGFua29nYWk
Base64.encodeURI(utf8);     //6KW_5b-rZGFua29nYWk 
Base64.encodeURL(utf8);     //6KW_5b-rZGFua29nYWk  (encodeURL是encodeURI的另外一个名称)

Base64.decode("6KW/5b+rZGFua29nYWk="); //西快dankogai
Base64.decode("6KW_5b-rZGFua29nYWk");  //西快dankogai

注意

但是需要注意，使用 js-base64 目前已知存在2点问题：

（1）获取原始字符串的二进制字节流 Uint8Array 时，使用了 TextDecoder；而 TextDecoder 仅支持 iOS10.3+，如果存在更低版本，则会有兼容性问题。

（2）进行 urisafe 编码（调用encodeURI或 encode('xx', true)）时，会把填充字符=删除掉（因为=属于uri保留字符），这就有可能导致base64解码时出现解析失败的情况（取决于解码api的鲁棒性。根据我的踩坑经验，IOS上存在该问题）

代码实现

下面这段代码，能够克服上述提到的使用 js-base64 存在的2个问题。


const base64ToUint6 = (nChr: number) =>
  nChr > 64 && nChr < 91
    ? nChr - 65
    : nChr > 96 && nChr < 123
    ? nChr - 71
    : nChr > 47 && nChr < 58
    ? nChr + 4
    : nChr === 45 // -
    ? 62
    : nChr === 95 // _
    ? 63
    : 0;
export const base64UrlUint8Decode = (sBase64: string): Uint8Array => {
  const sB64Enc = sBase64.replace(/[^A-Za-z0-9-_]/g, '');
  const nInLen = sB64Enc.length;
  const nOutLen = (nInLen * 3 + 1) >> 2;
  const taBytes = new Uint8Array(nOutLen);

  for (let nMod3, nMod4, nUint24 = 0, nOutIdx = 0, nInIdx = 0; nInIdx < nInLen; nInIdx++) {
    nMod4 = nInIdx & 3;
    nUint24 |= base64ToUint6(sB64Enc.charCodeAt(nInIdx)) << (6 * (3 - nMod4));
    if (nMod4 === 3 || nInLen - nInIdx === 1) {
      for (nMod3 = 0; nMod3 < 3 && nOutIdx < nOutLen; nMod3++, nOutIdx++) {
        taBytes[nOutIdx] = (nUint24 >>> ((16 >>> nMod3) & 24)) & 255;
      }
      nUint24 = 0;
    }
  }

  return taBytes;
};
const Uint6ToBase64 = (uint6: number) =>
  uint6 < 26
    ? uint6 + 65
    : uint6 < 52
    ? uint6 + 71
    : uint6 < 62
    ? uint6 - 4
    : uint6 === 62
    ? 45 // -
    : uint6 === 63
    ? 95 // _
    : 65;
export const base64UrlUint8Encode = (bytes: Uint8Array): string => {
  let nMod3 = 2;
  const sB64Enc = [];
  const nLen = bytes.length;
  for (let nUint24 = 0, nIdx = 0; nIdx < nLen; nIdx++) {
    nMod3 = nIdx % 3;
    nUint24 |= bytes[nIdx] << ((16 >>> nMod3) & 24);
    if (nMod3 === 2 || nLen - nIdx === 1) {
      sB64Enc.push(
        String.fromCodePoint(
          Uint6ToBase64((nUint24 >>> 18) & 63),
          Uint6ToBase64((nUint24 >>> 12) & 63),
          Uint6ToBase64((nUint24 >>> 6) & 63),
          Uint6ToBase64(nUint24 & 63),
        ),
      );
      nUint24 = 0;
    }
  }
  const result = sB64Enc.join('');
  return result.substring(0, result.length - 2 + nMod3);
};
const u8ToString = (bytes: Uint8Array): string => {
  const length = bytes.length;
  const result: string[] = [];
  for (let nIdx = 0; nIdx < length; nIdx++) {
    const nPart = bytes[nIdx];
    result.push(
      String.fromCodePoint(
        nPart > 251 && nPart < 254 && nIdx + 5 < length /* six bytes */
          ? /* (nPart - 252 << 30) may be not so safe in ECMAScript! So…: */
            (nPart - 252) * 0x40000000 +
              ((bytes[++nIdx] - 128) << 24) +
              ((bytes[++nIdx] - 128) << 18) +
              ((bytes[++nIdx] - 128) << 12) +
              ((bytes[++nIdx] - 128) << 6) +
              bytes[++nIdx] -
              128
          : nPart > 247 && nPart < 252 && nIdx + 4 < length /* five bytes */
          ? ((nPart - 248) << 24) +
            ((bytes[++nIdx] - 128) << 18) +
            ((bytes[++nIdx] - 128) << 12) +
            ((bytes[++nIdx] - 128) << 6) +
            bytes[++nIdx] -
            128
          : nPart > 239 && nPart < 248 && nIdx + 3 < length /* four bytes */
          ? ((nPart - 240) << 18) +
            ((bytes[++nIdx] - 128) << 12) +
            ((bytes[++nIdx] - 128) << 6) +
            bytes[++nIdx] -
            128
          : nPart > 223 && nPart < 240 && nIdx + 2 < length /* three bytes */
          ? ((nPart - 224) << 12) + ((bytes[++nIdx] - 128) << 6) + bytes[++nIdx] - 128
          : nPart > 191 && nPart < 224 && nIdx + 1 < length /* two bytes */
          ? ((nPart - 192) << 6) + bytes[++nIdx] - 128
          : /* nPart < 127 ? */ /* one byte */
            nPart,
      ),
    );
  }
  return result.join('');
};
const stringToU8 = (str: string): Uint8Array => {
  const nStrLen = str.length;
  let nArrLen = 0;
  /* mapping… */
  for (let nMapIdx = 0; nMapIdx < nStrLen; nMapIdx++) {
    const nChr = str.codePointAt(nMapIdx) || 0;
    if (nChr >= 0x10000) {
      nMapIdx++;
    }
    nArrLen +=
      nChr < 0x80
        ? 1
        : nChr < 0x800
        ? 2
        : nChr < 0x10000
        ? 3
        : nChr < 0x200000
        ? 4
        : nChr < 0x4000000
        ? 5
        : 6;
  }
  const aBytes = new Uint8Array(nArrLen);
  /* transcription… */
  let nIdx = 0;
  let nChrIdx = 0;
  while (nIdx < nArrLen) {
    const nChr = str.codePointAt(nChrIdx) || 0;
    if (nChr < 128) {
      /* one byte */
      aBytes[nIdx++] = nChr;
    } else if (nChr < 0x800) {
      /* two bytes */
      aBytes[nIdx++] = 192 + (nChr >>> 6);
      aBytes[nIdx++] = 128 + (nChr & 63);
    } else if (nChr < 0x10000) {
      /* three bytes */
      aBytes[nIdx++] = 224 + (nChr >>> 12);
      aBytes[nIdx++] = 128 + ((nChr >>> 6) & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
    } else if (nChr < 0x200000) {
      /* four bytes */
      aBytes[nIdx++] = 240 + (nChr >>> 18);
      aBytes[nIdx++] = 128 + ((nChr >>> 12) & 63);
      aBytes[nIdx++] = 128 + ((nChr >>> 6) & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
      nChrIdx++;
    } else if (nChr < 0x4000000) {
      /* five bytes */
      aBytes[nIdx++] = 248 + (nChr >>> 24);
      aBytes[nIdx++] = 128 + ((nChr >>> 18) & 63);
      aBytes[nIdx++] = 128 + ((nChr >>> 12) & 63);
      aBytes[nIdx++] = 128 + ((nChr >>> 6) & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
      nChrIdx++;
    } /* if (nChr <= 0x7fffffff) */ else {
      /* six bytes */
      aBytes[nIdx++] = 252 + (nChr >>> 30);
      aBytes[nIdx++] = 128 + ((nChr >>> 24) & 63);
      aBytes[nIdx++] = 128 + ((nChr >>> 18) & 63);
      aBytes[nIdx++] = 128 + ((nChr >>> 12) & 63);
      aBytes[nIdx++] = 128 + ((nChr >>> 6) & 63);
      aBytes[nIdx++] = 128 + (nChr & 63);
      nChrIdx++;
    }
    nChrIdx++;
  }
  return aBytes;
};

/** Base64 (URL) 解码 */
export const base64UrlDecode = (base64: string) => u8ToString(base64UrlUint8Decode(base64));

/** Base64 (URL) 的 JSON 数据解码 */
export const JSONUrlParse = (jsonUrl: string) => JSON.parse(base64UrlDecode(jsonUrl));

/** Base64 编码 (URL, No padding) */
export const base64UrlEncode = (str: string) => base64UrlUint8Encode(stringToU8(str));

/** JSON 数据编码为 Base64 (URL) */
export const JSONUrlStringify = (data: any) => base64UrlEncode(JSON.stringify(data));

/** Base64 (标准字符集) 转 Base64 (URL 字符集) */
export const base64Std2Url = (base64: string) =>
  base64.replace(/[+/]/g, c => ({ '+': '-', '/': '_' }[c] || ''));

/** Base64 (URL 字符集) 转 Base64 (标准字符集) */
export const base64Url2Std = (base64: string) =>
  base64.replace(/[-_]/g, c => ({ '-': '+', _: '/' }[c] || ''));

/** Base64 添加 Padding */
export const base64AddPadding = (base64: string) => {
  const mod = base64.replace(/[^A-Za-z0-9+/-_]/g, '').length % 4;
  const base = base64.replace(/(\s|=)+$/, '');
  return mod < 1 ? base : `${base}${'='.repeat(4 - mod)}`;
};

同样演示一个示例

import { base64AddPadding, base64UrlEncode, base64UrlDecode } from "./base64";

base64UrlEncode("西快dankogai");  //6KW_5b-rZGFua29nYWk
base64AddPadding(base64UrlEncode("西快dankogai")) //6KW_5b-rZGFua29nYWk=

base64UrlDecode("6KW_5b-rZGFua29nYWk");  //西快dankogai
base64UrlDecode("6KW_5b-rZGFua29nYWk="); //西快dankogai

【探讨】：使用base64对查询参数的value进行编码时，base64字符串中保留 = 会对URL解析产生影响吗？

答：不会！

以下面链接为例：kwalive://krndialog?bundleId=Live&component=main&data=6KW_5b-rZGFua29nYWk=&transparent=1

参考文献

Base64 Encode Algorithm

Base64 Decode Algorithm

Base64 MDN