前缀树 Trie

结点结构

这边我参考了力扣的一篇题解，前缀树的结点结构：

constructor() {
  this.children = {};
  this.isEndOfWord = false;
}

每个结点对应一个字符，isEndOfWord用于标识该结点是否是某个word的最后一个字符，children维护了该结点到子结点的指针。

插入

从根结点出发，遍历word的每个字符，如果当前结点next中没有该字符对应的成员，就新建一个结点，然后更新当前结点。设置最后一个字符的结点isEnd=true。

insert(word) {
  let node = this;
  for (const char of word) {
    if (!node.children[char]) {
      node.children[char] = new TrieNode(); // 如果子节点不存在，创建一个新节点
    }
    node = node.children[char]; // 移动到下一个节点
  }
  node.isEndOfWord = true; // 标记最后一个字符为单词的结尾
}

查找和前缀

查找和前缀的逻辑差不多，都是从根结点开始遍历。

如果对于单词的每个字符，都能在前缀树中找到对应的结点，则满足“startsWith”。如果最后找到的结点isEnd为true，则满足“search”。

  search(word) {
    let node = this;
    for (const char of word) {
      node = node.next[char];
      if (node === null) {
        return false;
      }
    }
    return node.isEndOfWord;
  }

  startsWith(prefix) {
    // 不需要判断最后一个字符结点的isEnd
    let node = this;
    for (const char of prefix) {
      node = node.next[char];
      if (node === null) {
        return false;
      }
    }
    return true;
  }

替换敏感词

替换敏感词只需要实现前缀树的insert方法。

假定函数接受两个参数，分别是：原字符串、敏感词数组。

操作的步骤：

创建前缀树，并且遍历敏感词数组，调用insert方法，把它们都加入前缀树。
遍历原字符串，currentIndex表示子串开始下标，matchEnd表示匹配到的最长子串的结束下标（因为匹配到isEndOfWord=true的时候没有直接break，而是继续往后试探）

  function findLongestMatch(node, text, index) {
    let current = node; // 当前节点
    let matchEnd = -1; // 匹配结束的索引，初始为-1表示没有匹配

    for (let i = index; i < text.length; i++) {
      const char = text[i];
      if (!current.children[char]) {
        break; // 如果字符不在前缀树中，退出循环
      }
      current = current.children[char]; // 移动到下一个节点
      if (current.isEndOfWord) {
        matchEnd = i; // 如果到达单词结尾，更新匹配结束的索引
        // 这里不要直接break，尽量匹配长的敏感词，因为有的敏感词是另一个词的前缀
      }
    }
    return matchEnd; // 返回匹配结束的索引
  }

  function getSanitizedText(text, sensitiveWords) {
    let sanitizedText = '';
    let currentIndex = 0;
    const trie = new TrieNode();
    for (const word of sensitiveWords) {
      trie.insert(word);
    }

    while (currentIndex < text.length) {
      const matchEnd = findLongestMatch(trie, text, currentIndex);
      if (matchEnd !== -1) {
        sanitizedText += '*'.repeat(matchEnd - currentIndex + 1); // 替换敏感词为*
        currentIndex = matchEnd + 1;
      } else {
        sanitizedText += text[currentIndex]; // 保留非敏感词部分
        currentIndex++;
      }
    }
    return sanitizedText;
  }

测试：

  const sensitiveWords = ['数学', '数', '数据结构'];
  const text = '我不喜欢数学，也不喜欢做数据运算，但是数据结构学得还可以。';
  console.log(getSanitizedText(text, sensitiveWords));

『问题探究』Trie——替换字符串中的敏感词

前缀树 Trie

结点结构

插入

查找和前缀

替换敏感词