第五章、基本引用类型

这是我参与8月更文挑战的第25天，活动详情查看： 8月更文挑战

5.3.3 String

2. normalize()方法

某些 Unicode 字符可以有多种编码方式。有的字符既可以通过一个 BMP 字符表示，也可以通过一个代理对表示。比如：

// U+00C5：上面带圆圈的大写拉丁字母 A 
console.log(String.fromCharCode(0x00C5)); // Å 
// U+212B：长度单位“埃”
console.log(String.fromCharCode(0x212B)); // Å 
// U+004：大写拉丁字母 A 
// U+030A：上面加个圆圈
console.log(String.fromCharCode(0x0041, 0x030A)); // Å

比较操作符不在乎字符看起来是什么样的，因此这 3 个字符互不相等。

let a1 = String.fromCharCode(0x00C5), 
 a2 = String.fromCharCode(0x212B), 
 a3 = String.fromCharCode(0x0041, 0x030A); 
console.log(a1, a2, a3); // Å, Å, Å 
console.log(a1 === a2); // false 
console.log(a1 === a3); // false 
console.log(a2 === a3); // false

为解决这个问题，Unicode提供了 4种规范化形式，可以将类似上面的字符规范化为一致的格式，无论底层字符的代码是什么。这 4种规范化形式是：NFD（Normalization Form D）、NFC（Normalization Form C）、 NFKD（Normalization Form KD）和 NFKC（Normalization Form KC）。可以使用 normalize()方法对字符串应用上述规范化形式，使用时需要传入表示哪种形式的字符串："NFD"、"NFC"、"NFKD"或"NFKC"。

注意这 4 种规范化形式的具体细节超出了本书范围，有兴趣的读者可以自行参考 UAX 15#: Unicode Normalization Forms 中的 1.2 节“Normalization Forms”

通过比较字符串与其调用 normalize()的返回值，就可以知道该字符串是否已经规范化了：

let a1 = String.fromCharCode(0x00C5), 
 a2 = String.fromCharCode(0x212B), 
 a3 = String.fromCharCode(0x0041, 0x030A); 
// U+00C5 是对 0+212B 进行 NFC/NFKC 规范化之后的结果
console.log(a1 === a1.normalize("NFD")); // false 
console.log(a1 === a1.normalize("NFC")); // true 
console.log(a1 === a1.normalize("NFKD")); // false 
console.log(a1 === a1.normalize("NFKC")); // true 
// U+212B 是未规范化的
console.log(a2 === a2.normalize("NFD")); // false 
console.log(a2 === a2.normalize("NFC")); // false 
console.log(a2 === a2.normalize("NFKD")); // false 
console.log(a2 === a2.normalize("NFKC")); // false 
// U+0041/U+030A 是对 0+212B 进行 NFD/NFKD 规范化之后的结果
console.log(a3 === a3.normalize("NFD")); // true 
console.log(a3 === a3.normalize("NFC")); // false 
console.log(a3 === a3.normalize("NFKD")); // true 
console.log(a3 === a3.normalize("NFKC")); // false

选择同一种规范化形式可以让比较操作符返回正确的结果：

let a1 = String.fromCharCode(0x00C5), 
 a2 = String.fromCharCode(0x212B), 
 a3 = String.fromCharCode(0x0041, 0x030A); 
console.log(a1.normalize("NFD") === a2.normalize("NFD")); // true 
console.log(a2.normalize("NFKC") === a3.normalize("NFKC")); // true 
console.log(a1.normalize("NFC") === a3.normalize("NFC")); // true

3. 字符串操作方法

本节介绍几个操作字符串值的方法。首先是 concat()，用于将一个或多个字符串拼接成一个新字符串。来看下面的例子：

let stringValue = "hello "; 
let result = stringValue.concat("world"); 
console.log(result); // "hello world" 
console.log(stringValue); // "hello"

在这个例子中，对 stringValue 调用 concat()方法的结果是得到"hello world"，但 stringValue 的值保持不变。concat()方法可以接收任意多个参数，因此可以一次性拼接多个字符串，如下所示：

let stringValue = "hello "; 
let result = stringValue.concat("world", "!"); 
console.log(result); // "hello world!" 
console.log(stringValue); // "hello"

这个修改后的例子将字符串"world"和"!"追加到了"hello "后面。虽然 concat()方法可以拼接字符串，但更常用的方式是使用加号操作符（+）。而且多数情况下，对于拼接多个字符串来说，使用加号更方便。

ECMAScript 提供了 3 个从字符串中提取子字符串的方法：slice()、substr()和 substring()。这 3个方法都返回调用它们的字符串的一个子字符串，而且都接收一或两个参数。第一个参数表示子字符串开始的位置，第二个参数表示子字符串结束的位置。对 slice()和 substring()而言，第二个参数是提取结束的位置（即该位置之前的字符会被提取出来）。对 substr()而言，第二个参数表示返回的子字符串数量。任何情况下，省略第二个参数都意味着提取到字符串末尾。与 concat()方法一样，slice()、substr() 和 substring()也不会修改调用它们的字符串，而只会返回提取到的原始新字符串值。来看下面的例子：

let stringValue = "hello world"; 
console.log(stringValue.slice(3)); // "lo world" 
console.log(stringValue.substring(3)); // "lo world" 
console.log(stringValue.substr(3)); // "lo world" 
console.log(stringValue.slice(3, 7)); // "lo w" 
console.log(stringValue.substring(3,7)); // "lo w" 
console.log(stringValue.substr(3, 7)); // "lo worl"

在这个例子中，slice()、substr()和 substring()是以相同方式被调用的，而且多数情况下返回的值也相同。如果只传一个参数 3，则所有方法都将返回"lo world"，因为"hello"中"l"位置为 3。如果传入两个参数 3 和 7，则 slice()和 substring()返回"lo w"（因为"world"中"o"在位置 7，不包含），而 substr()返回"lo worl"，因为第二个参数对它而言表示返回的字符数。

当某个参数是负值时，这 3 个方法的行为又有不同。比如，slice()方法将所有负值参数都当成字符串长度加上负参数值。

而 substr()方法将第一个负参数值当成字符串长度加上该值，将第二个负参数值转换为 0。 substring()方法会将所有负参数值都转换为 0。看下面的例子：

let stringValue = "hello world"; 
console.log(stringValue.slice(-3)); // "rld" 
console.log(stringValue.substring(-3)); // "hello world" 
console.log(stringValue.substr(-3)); // "rld" 
console.log(stringValue.slice(3, -4)); // "lo w" 
console.log(stringValue.substring(3, -4)); // "hel" 
console.log(stringValue.substr(3, -4)); // "" (empty string)

这个例子明确演示了 3 个方法的差异。在给 slice()和 substr()传入负参数时，它们的返回结果相同。这是因为-3 会被转换为 8（长度加上负参数），实际上调用的是 slice(8)和 substr(8)。而 substring()方法返回整个字符串，因为-3 会转换为 0。

在第二个参数是负值时，这 3 个方法各不相同。slice()方法将第二个参数转换为 7，实际上相当于调用 slice(3, 7)，因此返回"lo w"。而 substring()方法会将第二个参数转换为 0，相当于调用 substring(3, 0)，等价于 substring(0, 3)，这是因为这个方法会将较小的参数作为起点，将较大的参数作为终点。对 substr()来说，第二个参数会被转换为 0，意味着返回的字符串包含零个字符，因而会返回一个空字符串。

深入理解红宝书(25)

第五章、基本引用类型

5.3.3 String

2. normalize()方法

3. 字符串操作方法