数据结构与算法(Dart)之HashMap(十八)概述 Dart 中的 HashMap 是一个基于哈希表的 Map 实现

概述

Dart 中的 HashMap 是一个基于哈希表的 Map 实现，提供了高效的键值对存储和检索功能。 HashMap 是无序的（不保证迭代顺序），并且要求键必须具有一致的 Object.== 和 Object.hashCode 实现。

核心原理

1. 哈希表基础结构

HashMap 的底层实现基于数组（桶数组），通过哈希函数将键映射到数组索引：

哈希函数: hash(key) → 数组索引
数组索引 = hash(key) & (capacity - 1)

2. 索引计算机制

Dart 使用位运算来计算桶索引，这比取模运算更高效:

/**
 * 计算键在哈希表中的索引位置
 * 使用位与运算替代取模运算，提高性能
 * @param hashCode 键的哈希值
 * @param capacity 桶数组的容量（必须是2的幂）
 * @return 桶索引
 */
int calculateIndex(int hashCode, int capacity) {
  return hashCode & (capacity - 1);
}

3. 冲突解决机制

当多个不同的键产生相同的桶索引时，就会发生哈希冲突。 Dart HashMap 使用**分离链接法（Separate Chaining）**来解决冲突：

/**
 * 哈希表节点结构
 * 每个节点包含键值对和指向下一个节点的引用
 */
class HashMapEntry<K, V> {
  K key;           // 键
  V value;         // 值
  int hashCode;    // 缓存的哈希值
  HashMapEntry<K, V>? next;  // 链表中的下一个节点
  
  HashMapEntry(this.key, this.value, this.hashCode, this.next);
}

核心操作流程


base class _HashMap<K, V> extends MapBase<K, V> implements HashMap<K, V> {
  static const int _INITIAL_CAPACITY = 8;

  int _elementCount = 0;
  var _buckets = List<_HashMapEntry?>.filled(_INITIAL_CAPACITY, null);
  int _modificationCount = 0;

  int get length => _elementCount;
  bool get isEmpty => _elementCount == 0;
  bool get isNotEmpty => _elementCount != 0;

  Iterable<K> get keys => _HashMapKeyIterable<K, V>(this);
  Iterable<V> get values => _HashMapValueIterable<K, V>(this);
}

每个桶是一个固定长度列表（List）中的一个槽，最初有 8 个桶（_INITIAL_CAPACITY = 8）。每个桶可以通过一种称为_HashMapEntry 的链表结构存储多个元素.

插入操作（put）

/**
 * 向HashMap中插入键值对
 * 时间复杂度平均情况 : O(1) - 直接定位到桶，无冲突或冲突少
 * 时间复杂度最坏情况 : O(n) - 所有键都映射到同一个桶（极少发生）
 * 空间复杂度 : O(n) - 存储 n 个键值对
 */
/// HashMap 的 []= 操作符实现
void operator []=(K key, V value) {
  // 1. 计算键的哈希码
  final int hashCode = key.hashCode;
  
  // 2. 获取存储桶数组
  final List<_HashMapEntry<K, V>>? buckets = _buckets;
  
  // 3. 计算索引位置（使用位运算优化）
  final int index = hashCode & (buckets.length - 1);
  
  // 4. 遍历链表查找现有键
  _HashMapEntry<K, V>? entry = buckets[index];
  while (entry != null) {
    if (entry.hashCode == hashCode && entry.key == key) {
      // 找到现有键，更新值
      entry.value = value;
      return;
    }
    entry = entry.next;
  }
  
  // 5. 添加新条目
  _addEntry(key, value, hashCode, index);
}

关键优化

位运算优化：使用 & 操作代替 % 取模运算
前提条件： buckets.length 必须是 2 的幂次方
性能优势：位运算比除法运算快得多

// 使用位运算代替取模运算
int index = hashCode & (buckets.length - 1); // 快速
// 而不是: hashCode % buckets.length;        // 较慢

哈希码缓存:

class _HashMapEntry<K, V> {
  final K key;
  V value;
  final int hashCode; // 缓存哈希码，避免重复计算
  _HashMapEntry<K, V>? next;
}

双重检查机制:

// 先进行快速的整数比较（hashCode），再比较键相等性（精确）
if (entry.hashCode == hashCode && entry.key == key) {
  // 找到匹配的键
}

查找操作（get）

/**
 * 从HashMap中获取指定键的值
 * 时间复杂度平均情况 : O(1) - 理想情况下直接命中，无冲突
 * 时间复杂度最坏情况 : O(n) - 所有键都哈希到同一个桶（极少发生）
 * 际表现 : 接近 O(1)，因为良好的哈希函数会均匀分布键
 */
V? operator [](Object? key) {
    final hashCode = key.hashCode;
    final buckets = _buckets;
    final index = hashCode & (buckets.length - 1);
    var entry = buckets[index];
    while (entry != null) {
      if (hashCode == entry.hashCode && entry.key == key) {
        return unsafeCast<V>(entry.value);
      }
      entry = entry.next;
    }
    return null;
  }

用于查找和获取键值对的核心方法。

优化建议

// 避免重复查找
// 不好的做法：
if (map.containsKey(key)) {
  var value = map[key]; // 重复查找
}

// 好的做法：
var value = map[key];
if (value != null) {
  // 使用 value
}

遍历操作(forEach)

用于遍历所有键值对并对每个元素执行指定操作的核心方法。

/*
- 时间复杂度 : O(n) - 必须访问每个元素
- 空间复杂度 : O(1) - 只使用常量额外空间
- 遍历顺序 : 不保证顺序（HashMap 是无序的）
*/
void forEach(void action(K key, V value)) {
    final stamp = _modificationCount;
    final buckets = _buckets;
    final length = buckets.length;
    for (int i = 0; i < length; i++) {
      var entry = buckets[i];
      while (entry != null) {
        action(unsafeCast<K>(entry.key), unsafeCast<V>(entry.value));
        if (stamp != _modificationCount) {
          throw ConcurrentModificationError(this);
        }
        entry = entry.next;
      }
    }
  }

并发修改检测初始化

记录当前的修改计数器值
_modificationCount 在每次结构性修改（增删）时递增
用于检测遍历过程中是否发生了并发修改
遍历过程中更新已有键key的value不会导致异常

/// 演示并发过程中添加操作检测
void concurrentModificationDemo() {
  var map = HashMap<String, int>();
  map['a'] = 1;
  map['b'] = 2;
  map['c'] = 3;
  
  try {
    map.forEach((key, value) {
      print('$key: $value');
      
      // 在遍历过程中修改 map(添加操作) - 这会触发异常
      if (key == 'b') {
        map['d'] = 4; // 这里会导致 ConcurrentModificationError
      }
    });
  } catch (e) {
    print('捕获异常: $e'); // 输出: 捕获异常: ConcurrentModificationError
  }
}

避免在遍历中修改(增删)

// ❌ 错误做法：在 forEach 中修改 map
map.forEach((key, value) {
  if (value < 0) {
    map.remove(key); // 会抛出 ConcurrentModificationError
  }
});

// ✅ 正确做法：先收集要删除的键
var keysToRemove = <String>[];
map.forEach((key, value) {
  if (value < 0) {
    keysToRemove.add(key);
  }
});
keysToRemove.forEach(map.remove);

双重循环遍历

外层循环：遍历所有桶（buckets）
内层循环：遍历每个桶中的链表

移除操作(remove)

/*
时间复杂度
- 平均情况 ：O(1) - 假设哈希分布均匀，链表长度较短
- 最坏情况 ：O(n) - 所有元素都哈希到同一个桶中
*/
  V? remove(Object? key) {
    final hashCode = key.hashCode;
    final buckets = _buckets;
    final index = hashCode & (buckets.length - 1);
    var entry = buckets[index];
    _HashMapEntry? previous = null;
    while (entry != null) {
      final next = entry.next;
      if (hashCode == entry.hashCode && entry.key == key) {
        _removeEntry(entry, previous, index);
        _elementCount--;
        _modificationCount =
            (_modificationCount + 1) & _MODIFICATION_COUNT_MASK;
        return unsafeCast<V>(entry.value);
      }
      previous = entry;
      entry = next;
    }
    return null;
  }
  
   void _removeEntry(
    _HashMapEntry entry,
    _HashMapEntry? previousInBucket,
    int bucketIndex,
  ) {
    if (previousInBucket == null) {
      _buckets[bucketIndex] = entry.next;
    } else {
      previousInBucket.next = entry.next;
    }
  }

entry 指向当前桶中的第一个条目
previous 用于跟踪前一个条目，便于后续的链表操作

哈希冲突处理

使用链地址法：相同哈希值的元素形成链表
双重比较确保正确性：先比较哈希值，再比较键值

性能优化

位运算计算索引： hashCode & (buckets.length - 1)
提前保存 next 指针，避免额外的内存访问
使用 unsafeCast 避免运行时类型检查

内存管理

通过 _removeEntry 方法正确维护链表结构
及时更新元素计数，便于扩容/缩容决策

移除节点

移除链表头节点。当 previousInBucket 为 null 时，说明要移除的是链表的第一个节点。
移除链表中间或尾部节点。当 previousInBucket 不为 null 时，说明要移除的是链表中间或尾部的节点。

_addEntry

核心的内部方法，负责向哈希表中添加新的键值对条目。

/*
- buckets : 哈希表的桶数组，每个桶可能包含一个链表的头节点
- index : 要插入的桶索引位置
- length : 桶数组的长度
- key : 要插入的键
- value : 要插入的值
- hashCode : 键的哈希值
*/
  void _addEntry(
    List<_HashMapEntry?> buckets,
    int index,
    int length,
    K key,
    V value,
    int hashCode,
  ) {
    final entry = _HashMapEntry(key, value, hashCode, buckets[index]);
    buckets[index] = entry;
    final newElements = _elementCount + 1;
    _elementCount = newElements;
    // If we end up with more than 75% non-empty entries, we
    // resize the backing store.
    if ((newElements << 2) > ((length << 1) + length)) _resize();
    _modificationCount = (_modificationCount + 1) & _MODIFICATION_COUNT_MASK;
  }

核心逻辑

创建新条目并插入链表头部

final entry = _HashMapEntry(key, value, hashCode, buckets[index]); 
buckets[index] = entry;

关键特性：

新条目总是插入到链表的头部（头插法）
将原来的头节点作为新节点的 next 指针
这种方式的时间复杂度为 O(1)

更新元素计数

final newElements = _elementCount + 1; 
_elementCount = newElements;

维护哈希表中元素的总数
用于后续的负载因子计算

负载因子检查与动态扩容

if ((newElements << 2) > ((length << 1) + length)) _resize();

扩容条件分析：

newElements << 2 等价于 newElements * 4
(length << 1) + length 等价于 length * 3
条件简化为： newElements * 4 > length * 3
即： newElements / length > 0.75 负载因子阈值：75%
当哈希表的负载因子超过 75% 时触发扩容
这是一个经典的负载因子阈值，平衡了空间利用率和性能

并发修改检测计数器更新

_modificationCount = (_modificationCount + 1) & _MODIFICATION_COUNT_MASK;

并发修改检测机制：

每次修改操作都会增加 _modificationCount
使用位掩码 _MODIFICATION_COUNT_MASK 防止整数溢出
迭代器会检查这个计数器来检测并发修改

设计亮点

高效的插入策略

头插法确保 O(1) 的插入时间复杂度
新插入的元素更可能被访问（局部性原理）

智能的扩容机制

75% 的负载因子是经过优化的阈值
使用位运算提高计算效率
自动扩容保证性能不会严重退化

并发安全检测

修改计数器机制可以检测迭代过程中的并发修改
防止 ConcurrentModificationError

位运算优化

使用左移运算符代替乘法运算
<< 2 代替 * 4 ， << 1 代替 * 2
提高计算性能

与之前分析的关联

添加操作： _addEntry - 头插法，O(1) 时间复杂度
删除操作： _removeEntry - 链表遍历删除，平均 O(1) 时间复杂度
查找操作：通过哈希值定位桶，然后遍历链表

性能特征

时间复杂度：O(1) 平均情况
空间复杂度：O(1) 每次插入
负载因子控制：保持在 75% 以下，确保良好性能
扩容成本：当触发扩容时为 O(n)，但摊销后仍为 O(1)

参考资料

Dart - Collections

Dart 中不同 Map 实现之间有什么区别？

collection_patch.dart

Dart HashMap Deep Dive: Internals, Collisions, and Performance