IndexScan比SeqScan返回的结果更少，索引损坏？给大家分享一个有趣的案例，同一个 sql，索引扫描比全表顺序

关于作者：

Nickyoung，数据库领域从业者。PostgreSQL ACE，IvorySQL专家顾问委员会成员。

公众号 “ 👉 PostgreSQL 运维之道 ”。

给大家分享一个有趣的案例，同一个 sql，索引扫描比全表顺序扫描获取的数据更少。本篇我们深入分析一起索引排序规则损坏的案例，并 debug 验证索引扫描的主要过程。

问题现象

走索引扫描查询到 1 条数据。

testidx=# explain analyze select *  from user_info where userid ='1230005998';
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_userid on user_info  (cost=0.28..35.61 rows=9 width=57) (actual time=0.030..0.032 rows=1 loops=1)
   Index Cond: ((userid)::text = '1230005998'::text)
 Planning Time: 0.118 ms
 Execution Time: 0.057 ms
(4 rows)

testidx=# select ctid,userid,region_id from user_info where userid ='1230005998';
  ctid  | userid    | region_id 
--------+----------------------+-----------
 (4,39) | 1230005998 | abc
(1 row)

不走索引顺序扫描查询到 11 条数据。

testidx=# set enable_indexscan to off;
SET
testidx=# explain analyze select * from user_info where userid ='1230005998';
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Seq Scanon user_info  (cost=0.00..51.50rows=9 width=57) (actual time=0.093..0.460rows=11 loops=1)
   Filter: ((userid)::text = '1230005998'::text)
   Rows Removed by Filter: 1309
 Planning Time: 0.116 ms
 Execution Time: 0.478 ms
(5rows)

testidx=# select ctid,userid,region_id from user_info where userid ='1230005998';
  ctid   | userid    | region_id 
---------+----------------------+-----------
 (4,39)  | 1230005998 | abc
 (9,14)  | 1230005998 | abc
 (9,32)  | 1230005998 | abc 
 (10,32) | 1230005998 | abc
 (12,5)  | 1230005998 | abc
 (26,23) | 1230005998 | abc
 (27,4)  | 1230005998 | abc
 (27,9)  | 1230005998 | abc
 (27,11) | 1230005998 | abc
 (34,38) | 1230005998 | abc
 (34,39) | 1230005998 | abc
(11rows)

testidx=#

对比两次查询结果，可以看到走索引扫描时，仅查询到第一条匹配的数据，对应 ctid 为(4,39)。索引损坏了？

问题分析

当我们怀疑索引损坏时，可以使用 amcheck 插件对索引进行扫描分析，检查是否存在异常。

可以看到 leaf page 8 的 itemoffset 24 和 25 违反了条目顺序不变性规则。即按照升序原则 24 号索引槽位对应的键值要小于等于 25 槽位，但经检查是大于的，所以排序规则混乱了。

testidx=# select * from bt_index_check('index_userid',true);
DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
DEBUG:  verifying level 1 (true root level)
DEBUG:  verifying 7 items on internal block 3
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying 207 items on leaf block 1
DEBUG:  verifying 204 items on leaf block 2
DEBUG:  verifying 204 items on leaf block 4
DEBUG:  verifying 204 items on leaf block 5
DEBUG:  verifying 204 items on leaf block 6
DEBUG:  verifying 235 items on leaf block 7
DEBUG:  verifying 78 items on leaf block 8
ERROR:  item order invariant violated for index "index_userid"
DETAIL:  Lower index tid=(8,24) (points to heap tid=(4,14)) higher index tid=(8,25) (points to heap tid=(9,14)) page lsn=1/331E9F98.
testidx=#

使用 pageinspect 扩展，查看 leaf page 8 有 78 条记录，其中 itemoffset 24 和 25 对应的键值，24 的键值为'31 09 xxx'，25 的键值为'2b 4c xxx'，前者大，确实是有问题的。

testidx=# select * from bt_page_stats('index_userid',8);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
     8 | l    |         78 |          0 |            31 |      8192 |      5356 |         7 |         0 |    0 |          1
(1 row)

testidx=#

testidx=# select * from  bt_page_items('index_userid',8) where itemoffset in (22,23,24,25);
 itemoffset |  ctid  | itemlen | nulls | vars |                                  data                                   
------------+--------+---------+-------+------+-------------------------------------------------------------------------
         22 | (4,39) |      32 | f     | t    | 2b 4c 54 34 33 36 32 35 31 33 34 00 00 00 00 00 00 00 00 00 00 00 00 00
         23 | (4,9)  |      32 | f     | t    | 31 09 0d 0a 4c 54 34 33 36 32 35 31 33 34 37 33 36 30 30 37 39 30 30 32
         24 | (4,14) |      32 | f     | t    | 31 09 0d 0a 4c 54 34 33 36 32 35 31 33 34 37 33 36 30 30 37 39 30 30 32
         25 | (9,14) |      32 | f     | t    | 2b 4c 54 34 33 36 32 35 31 33 34 00 00 00 00 00 00 00 00 00 00 00 00 00
(4 rows)

testidx=#

明显的索引损坏了，怎么损坏的呢？

可能是 BUG 或者系统异常导致数据库 crash 等写坏，还有一个glibc 版本差异导致索引损坏的场景，特别是 glibc 2.28 之前和之后的版本。

经过排查这次异常就是 glibc 差异导致的，glibc 版本从 2.17 到 2.28。

当遇到这样的索引损坏场景时，建议 reindex 对应的索引来修复。

这个问题基本分析清楚了，不过老杨不打算到此为止。借此机会证实下索引扫描的逻辑，也搞清楚为什么仅扫描一条数据就结束。感兴趣的朋友可以继续往下看。

原理分析

btree 想必大家都很熟悉了（其实我很讨厌面试中对于 btree 的八股文，haha...）

再来回顾下结构，细节可以参考灿灿的书中btree 章节

检索的时候，从 root page 开始检索，在 leaf page 中找到键值匹配的 heap ctid，通过 ctid 去 heap 中 fetch 对应的数据。这里借用德哥画的图，来自github 博客。

另外 postgrespro 的博客btree 章节，对于检索过程描述的不错，推荐大家去看看。

例如查找等于 49 的数据，标黄部分及蓝色箭头描述了检索过程：从 root 节点出发，找到第一个匹配的 leaf 节点，顺着 leaf 节点的链表一直查找，直到检索完所有匹配的 leaf 节点。

简单回顾一些概念和原理后，我们上手 debug 来证实检索过程。

我们的检索条件为userid ='1230005998'

1. 先确定 first leaf page

btgettuple函数中首次扫描走_bt_first函数逻辑。

通常 leaf page 会有多个，扫描时通过二分查找，先找到键值匹配的目标 leaf page。在_bt_first 函数中，调用_bt_search 函数，再调用_bt_binsrch 函数进行二分查找。

初始的 low 为 1，high 为 8 对应 index_userid 这个索引的 leaf block 1 和 8

_bt_compare 函数进行 key 匹配，这里 userid 为 text 类型，因此使用的比较函数为 bttextcmp

我们省去二分查找的过程，最终 high=low=8，确定目标数据在 leaf page 8

2、确定 first item

开始扫描目标 leaf page，同样采用二分查找，找到第一条匹配的 item。

_bt_first 函数走到 offnum = _bt_binsrch(rel, &inskey, buf)，在_bt_binsrch 函数中初始 high 为 78，low 为 1（因为 leaf page 8 有 78 条 item）。

在多轮二分查找后，mid 为 22 时_bt_compare 匹配到了预期数据。bttextcmp 函数中可以看到 text_cmp 入参 arg1, arg2 相同，都为 1230005998，result 为 0。

因此，low 为 22，high 为 22，找到了 first item。

3、遍历页面元组，设置扫描边界

while (offnum <= maxoff)循环，offnum 为 22，maxoff 为 78。

从 first item 即 offnum=22 开始遍历，_bt_readpage 中调用_bt_checkkeys 首次比较结果相同，itemIndex++为 1，continuescan 为 true，offnum 延顺到 Next 即 23。

循环中再次调用_bt_checkkeys 进行比较，实际的比较函数为 texteq，offnum 为 23 时 key 值明显和检索条件的长度不同，值肯定是不同的，result 为 false。

result 传递给 test，因此*continuescan = false，_bt_checkkeys 返回 false。

continuescan 为 false，因此 so->currPos.moreRight=false，so->currPos.firstItem = 0， so->currPos.lastItem = 1 - 1， so->currPos.itemIndex = 0;

就是这几个属性决定了扫描边界。 firstItem 和 lastItem 相同都为 0，说明扫描的范围就是 first Item 这一条数据。

index_getnext_slot 函数中根据 ctid(4,39)调用 index_fetch_heap 获取 heap 数据。

4、获取 next Item

btgettuple 函数中，后续扫描调用_bt_next 函数。

so->currPos.moreRight 为 false，_bt_readnextpage 函数 return false，因此_bt_steppage 函数 return false

因此_bt_next 函数返回 false，btgettuple 返回 false

index_getnext_tid 函数返回 NULL

tid 为 NULL，index_getnext_slot 函数返回 NULL

至此扫描结束。

从这个过程中可以看到，itemoffset 22 即记录 ctid(4,39)这条索引键值和检索条件匹配，但 23 不匹配，因此导致索引扫描结束，只扫描了一条数据。

从 seqscan 结果看，ctid (4,39)下一条符合条件的数据为(9,14)，对应到索引 itemoffset 25。从 bt_page_items 的结果来看，23 和 24 的键值是一样的，都比 25 大，因此索引排序规则是错乱的。

小结

本篇我们深入分析了一起索引排序规则损坏的案例，当出现类似问题时，可以利用 amcheck 和 pageinspect 扩展来分析解决。同时也 debug 证实了下索引扫描的一些关键过程。

HOW 2026 议题招募中

2026 年 4 月 27-28 日，由 IvorySQL 社区联合 PGEU（欧洲 PG 社区）、PGAsia（亚洲 PG 社区）共同打造的 HOW 2026（IvorySQL & PostgreSQL 技术峰会）将再度落地济南。届时，PostgreSQL 联合创始人 Bruce Momjian 等顶级大师将亲临现场。

自开启征集以来，HOW 2026 筹备组已感受到来自全球 PostgreSQL 爱好者的澎湃热情。为了确保大会议题的深度与广度，我们诚邀您在 2026 年 2 月 27 日截止日期前，提交您的技术见解。

投递链接：jsj.top/f/uebqBc