问题
最近遇到问题,在5.7.43中查询i_s.innodb_trx的时候出现了输出数据和当前信息不符的情况,且检查show engine 出现大量的如下RW锁信息,
--Thread 140086215583488 has waited at trx0i_s.cc line 1495 for 0 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x3ca6f98 created in file trx0i_s.cc line 1417
a writer (thread id 140086215583488) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file trx0i_s.cc line 1465
Last time write locked in file /var/lib/pb2/sb_1-11862956-1687353311.91/mysql-5.7.43/storage/innobase/trx/trx0i_s.cc line 1495
检查数据库发现大量访问innodb_trx/innodb_locks/innodb_lock_waits的语句。
i_s.innodb_trx的缓存
当我们访问i_s.innodb_trx的时候,实际上是首先构建一个叫做trx_i_s_cache的cache缓存,这个缓存是全局的一个缓存,大概如下
trx_i_s_cache trx_i_s_cache_t
-- innodb_trx i_s_table_cache_t innodb_trx的内存
-- i_s_mem_chunk_t内存chunk39个 是存储cache行的主要空间,innodb_trx的每行都存在里面
i_s_mem_chunk_t chunks[0]
-- offset读取的offset
-- rows_allocd本chunk分配的行
-- void* base 内存所在位置
i_s_mem_chunk_t chunks[1]
...
i_s_mem_chunk_t chunks[MEM_CHUNKS_IN_TABLE_CACHE-1]
-- rows_used使用中的行
-- rows_allocd分配的行
-- row_size 行的大小
-- innodb_locks i_s_table_cache_t 如上,innodb_locks的内存
-- innodb_lock_waits i_s_table_cache_t 如上,innodb_lock_waits的内存
-- locks_hash hash_table_t* 10000个cell的hash查找表,应用于查找重复的冲突行
-- storage ha_storage_t* <------------------------------| 内存
-- heap mem_heap_t* ha_storage_t地址地址所在内存 ----| 主要用于存储innodb_trx中的query,innodb_locks中的lock_data等
-- ha_storage_t --->ha_storage_node_t --->ha_storage_node_t --->ha_storage_node_t
-- 1024字节内存
-- hash hash_table_t* 2048的cell的查找表 主要用于查找ha_storage_node_t结构
cell[0] --->*ha_storage_node_t
...
cell[2047] --->*ha_storage_node_t
-- mem_allocd 分配的内存大小
-- is_truncated 是否被截断了
这里的我们发现这个缓存包含了3个i_s_table_cache_t类型的结构,分别用于填充innodb_locks/innodb_lock_waits/innodb_locks 信息,其中每个结构都包含了39个chunk,用于实际的存储每行数据,第一个chunk可以存储1024行的数据,当一个chunk不够的时候,就会扩充chunk,并且每次空充都会增加一部分也就是old+1/2old,这样实际上可以保存非常多的行,下面是代码中的一段解释,
The rows of a table cache are stored in a set of chunks. When a new
row is added a new chunk is allocated if necessary. Assuming that the
first one is 1024 rows (TABLE_CACHE_INITIAL_ROWSNUM) and each
subsequent is N/2 where N is the number of rows we have allocated till
now, then 39th chunk would accommodate 1677416425 rows and all chunks would accommodate 3354832851 rows.
而storage主要用于存储一些特别的信息比如innodb_trx中的query,innodb_locks中的lock_data等。在5.7中访问innodb_trx/innodb_lock_waits/innodb_locks中的任何一个视图都会构建整个cache,当然在构建的时候需要对缓存上RW的X锁,并且会对lock_sys和trx_sys上锁,然后扫描trx_sys读写链表和trx_sys的只读链表进行填充缓存,每次重新填充cache缓存的时候会完全清理上一次填充的全部信息,当然为了避免代价过高,如果如果两次填充cache缓存小于0.1S将不会填充,直接访问缓存即可。当构建完成后会释放各种锁。 然后进入读取截断,读取截断需要对缓存上RW的S锁,对不同的视图访问不同的结构,也就是cache中的3个i_s_table_cache_t类型的结构中的一个。 当我们写的语句需要大量循环不断的访问innodb_trx/innodb_lock_waits/innodb_locks 且当前事务和锁堵塞信息很多的时候,这部分代价也是可观的。这里谈到的缓存上的RW就是前面看到的,
X-lock (wait_ex) on RW-latch at 0x3ca6f98 created in file trx0i_s.cc line 1417
到了8.0过后访问innodb_trx依旧如此,并且也会填充整个cache,但是当我们访问视图p_s.data_lock_waits/p_s.data_locks的时候已经不依赖这个cache,但从实际经验来看8.0也应该尽量避免大量平凡访问这部分视图。
简单测试和总结
为了证明这个缓存的存在,我们可以debug更改一下两次收集的时间让其小于0.1s,然后比如show processlist来观察如下,
我们这里发现id 4的session在show processlist并不存在,这里通过gdb修改强制不够构建cache,使用的是老的cache,而这个id 4的session实际已经退出了。
因此对于提到的问题,我们应该立即停掉大量访问innodb_trx/innodb_lock_waits/innodb_locks语句的应用,保证cache能够正常更新,然后访问innodb_trx正常了。当然故障案例中为什么cache不能正常更新,且没有完全堵塞,而是能够访问到旧的cache的数据这点有点费解。
代码部分
启动的时候重新初始化
trx_i_s_cache_init(trx_i_s_cache)
->cache->rw_lock = static_cast<rw_lock_t*>(ut_malloc_nokey(sizeof(*cache->rw_lock)))
首先分配cache上的读写锁
->rw_lock_create(trx_i_s_cache_lock_key, cache->rw_lock,SYNC_TRX_I_S_RWLOCK);
创建锁
->cache->last_read = 0
->mutex_create(LATCH_ID_CACHE_LAST_READ, &cache->last_read_mutex)
建立保护last_read的mutex
->table_cache_init(&cache->innodb_trx, sizeof(i_s_trx_row_t))
i_s_trx_row_t为innodb_trx行相对的一个结构体,传入后为row_size
->table_cache->rows_used = 0;table_cache->rows_allocd = 0;table_cache->row_size = row_size;
进行初始化
->for (i = 0; i < MEM_CHUNKS_IN_TABLE_CACHE; i++)
MEM_CHUNKS_IN_TABLE_CACHE 为39,初始化39个chunk为NULL
->table_cache->chunks[i].base = NULL
->table_cache_init(&cache->innodb_locks, sizeof(i_s_locks_row_t));
i_s_locks_row_t为innodb_locks行对应额结构体,初始化39个chunk为NULL
->table_cache_init(&cache->innodb_lock_waits,sizeof(i_s_lock_waits_row_t));
i_s_lock_waits_row_t为innodb_lock_waits行对应额结构体,初始化39个chunk为NULL
->cache->locks_hash = hash_create(LOCKS_HASH_CELLS_NUM)
建立包含10000个cell的hash查找表
->cache->storage = ha_storage_create(CACHE_STORAGE_INITIAL_SIZE,CACHE_STORAGE_HASH_CELLS)
CACHE_STORAGE_INITIAL_SIZE 1024/CACHE_STORAGE_HASH_CELLS 2048
->initial_heap_bytes = HA_STORAGE_DEFAULT_HEAP_BYTES
->initial_hash_cells = HA_STORAGE_DEFAULT_HASH_CELLS
->heap = mem_heap_create(sizeof(ha_storage_t)+ initial_heap_bytes)
这里应该是ha_storage_t结构所在地址然后增加了1024字节内存
->storage = (ha_storage_t*) mem_heap_alloc(heap, sizeof(ha_storage_t));
从headp中分配内存用于存储ha_storage_t的内存
->storage->hash = hash_create(initial_hash_cells)
分配2048的cell的查找表
->return(storage)
返回这个storage的地址
->cache->mem_allocd = 0
分配的内存为0
->cache->is_truncated = false
是否截断
访问视图加载
trx_i_s_common_fill_table
->cache = trx_i_s_cache
获取全局trx_i_s_cache
->table_name = tables->schema_table_name
需要查询的是哪一个视图 一共包含3个 innodb_trx innodb_locks innodb_lock_waits
->trx_i_s_cache_start_write(cache)
获取trx_i_s_cache上的rw_lock X
->trx_i_s_possibly_fetch_data_into_cache
->if (!can_cache_be_updated(cache))
是否可以更改cache
->if (now - cache->last_read > CACHE_MIN_IDLE_TIME_US)
是否每次填充超过了0.1s
->return(TRUE)
如果是就返回true
->return(FALSE)
否则不需要更新,也就是两次获取少于0.1不更新
->lock_mutex_enter
->mutex_enter(&lock_sys->mutex)
加锁lock_sys锁
->trx_sys_mutex_enter
->mutex_enter(&trx_sys->mutex)
加锁trx_sys锁
->fetch_data_into_cache(cache)
->trx_i_s_cache_clear(cache)
->cache->innodb_trx.rows_used = 0;cache->innodb_locks.rows_used = 0;cache->innodb_lock_waits.rows_used = 0;
清空3个表记录的行数
->hash_table_clear(cache->locks_hash)
清理lock_hash hash查找表的内存
->memset(table->array, 0x0,table->n_cells * sizeof(*table->array))
清理locks_hash hash查找表中的内存
->ha_storage_empty(&cache->storage)
清理storage中的内存
->temp_storage.heap = (*storage)->heap;temp_storage.hash = (*storage)->hash;
建立temp_storage指向内存
->hash_table_clear(temp_storage.hash)
清理storage的hash查找表
->mem_heap_empty(temp_storage.heap)
清空hash和堆内存
->*storage = (ha_storage_t*)mem_heap_alloc(temp_storage.heap,sizeof(ha_storage_t))
重新storage初始化内存
->(*storage)->heap = temp_storage.heap;(*storage)->hash = temp_storage.hash;
恢复指向
->fetch_data_into_cache_low(cache, true, &trx_sys->rw_trx_list)
首先获取读写事务的信息
->rw_trx_list = trx_list == &trx_sys->rw_trx_list
是否是读写链表,因为这个函数读写链表和读链表都要调用
->for (trx = UT_LIST_GET_FIRST(*trx_list);trx != NULL;trx =(rw_trx_list? UT_LIST_GET_NEXT(trx_list, trx): UT_LIST_GET_NEXT(mysql_trx_list, trx)))
循环这个读写链表中的事务
->i_s_trx_row_t* trx_row;i_s_locks_row_t* requested_lock_row;
定义innodb_trx和innodb_locks的行指针
->if (!trx_is_started(trx)|| (!rw_trx_list && trx->id != 0 && !trx->read_only)
A:是否事务已经开启 如果处于TRX_STATE_NOT_STARTED 不会打印
B:如果是只读链表 并且 trx->id不为0 并且 不是read only事务 不打印,这应该是只读链表打印的判断
->ut_ad(trx->in_rw_trx_list == rw_trx_list);
本处断言 打印的读写链表 和 只读链表
->if (!add_trx_relevant_locks_to_cache(cache, trx,&requested_lock_row))
首先填充innodb_locks需要的信息
->if (trx->lock.que_state == TRX_QUE_LOCK_WAIT)
如果事务处于TRX_QUE_LOCK_WAIT状态下
->wait_lock_heap_no = wait_lock_get_heap_no(trx->lock.wait_lock)
获取等待的锁的heap no
->*requested_lock_row = add_lock_to_cache(cache, trx->lock.wait_lock, wait_lock_heap_no)
->dst_row = (i_s_locks_row_t*) table_cache_create_empty_row(&cache->innodb_locks, cache)
建立一个新的i_s_locks_row_t结构 innodb_locks
->if (!fill_locks_row(dst_row, lock, heap_no, cache))
其他填充
->if (!fill_lock_data(&row->lock_data, lock, heap_no, cache))
row->lock_data填充
->ha_storage_put_memlim
->HASH_INSERT
插入到locks_hash中
->for (curr_lock = lock_queue_iterator_get_prev(&iter);curr_lock != NULL;curr_lock = lock_queue_iterator_get_prev(&iter))
迭代获取blocking lock的信息
->add_lock_wait_to_cache
->dst_row = (i_s_lock_waits_row_t*)table_cache_create_empty_row(&cache->innodb_lock_waits,cache)
从chunk中分配建立一个新的i_s_lock_waits_row_t结构 innodb_lock_waits
->fill_lock_waits_row(dst_row, requested_lock_row, blocking_lock_row)
填充
....
->else
*requested_lock_row = NULL;
->trx_row = reinterpret_cast<i_s_trx_row_t*>( table_cache_create_empty_row(&cache->innodb_trx, cache))
从chunk中分配一个i_s_trx_row_t innodb_trx
->if (!fill_trx_row(trx_row, trx, requested_lock_row, cache))
...
->stmt_len = innobase_get_stmt_safe(trx->mysql_thd, query, sizeof(query))
trx中的length不为0,其他填充
->if (stmt_len > 0)
->ha_storage_put_memlim
分配data内存
->fetch_data_into_cache_low(cache, false, &trx_sys->mysql_trx_list)
获取read-only事务信息,如上只是不填充innodb_locks中的信息
但是读取只读事务
->cache->is_truncated = FALSE;
//到这里内存信息已经加载
->trx_i_s_cache_start_read(cache);
加S 锁
->if (innobase_strcasecmp(table_name, "innodb_trx") == 0)
如果需要的视图是innodb_trx,读取cache获取
->fill_innodb_trx_from_cache(cache, thd, tables->table)
->else if (innobase_strcasecmp(table_name, "innodb_locks") == 0)
如果需要的视图是innodb_locks,读取cache获取
->fill_innodb_locks_from_cache(cache, thd, tables->table)
->else if (innobase_strcasecmp(table_name, "innodb_lock_waits") == 0)
如果需要的视图是innodb_lock_waits,读取cache获取
->fill_innodb_lock_waits_from_cache
/** This represents one table's cache. */
struct i_s_table_cache_t {
ulint rows_used; /*!< number of used rows */
ulint rows_allocd; /*!< number of allocated rows */
ulint row_size; /*!< size of a single row */
i_s_mem_chunk_t chunks[MEM_CHUNKS_IN_TABLE_CACHE]; /*!< array of
memory chunks that stores the
rows */
};
/** This structure describes the intermediate buffer */
struct trx_i_s_cache_t {
rw_lock_t* rw_lock; /*!< read-write lock protecting
the rest of this structure */
uintmax_t last_read; /*!< last time the cache was read;
measured in microseconds since
epoch */
ib_mutex_t last_read_mutex;/*!< mutex protecting the
last_read member - it is updated
inside a shared lock of the
rw_lock member */
i_s_table_cache_t innodb_trx; /*!< innodb_trx table */
i_s_table_cache_t innodb_locks; /*!< innodb_locks table */
i_s_table_cache_t innodb_lock_waits;/*!< innodb_lock_waits table */
/** the hash table size is LOCKS_HASH_CELLS_NUM * sizeof(void*) bytes */
#define LOCKS_HASH_CELLS_NUM 10000
hash_table_t* locks_hash; /*!< hash table used to eliminate
duplicate entries in the
innodb_locks table */
/** Initial size of the cache storage */
#define CACHE_STORAGE_INITIAL_SIZE 1024
/** Number of hash cells in the cache storage */
#define CACHE_STORAGE_HASH_CELLS 2048
ha_storage_t* storage; /*!< storage for external volatile
data that may become unavailable
when we release
lock_sys->mutex or trx_sys->mutex */
ulint mem_allocd; /*!< the amount of memory
allocated with mem_alloc*() */
ibool is_truncated; /*!< this is TRUE if the memory
limit was hit and thus the data
in the cache is truncated */
};
/** Memory for each table in the intermediate buffer is allocated in
separate chunks. These chunks are considered to be concatenated to
represent one flat array of rows. */
struct i_s_mem_chunk_t {
ulint offset; /*!< offset, in number of rows */
ulint rows_allocd; /*!< the size of this chunk, in number
of rows */
void* base; /*!< start of the chunk */
};
39个chunk,初始化的内存为1024行,每个chunk增加会增加
#define TABLE_CACHE_INITIAL_ROWSNUM 1024
也就是如下的说明
The rows of a table cache are stored in a set of chunks. When a new
row is added a new chunk is allocated if necessary. Assuming that the
first one is 1024 rows (TABLE_CACHE_INITIAL_ROWSNUM) and each
subsequent is N/2 where N is the number of rows we have allocated till
now, then 39th chunk would accommodate 1677416425 rows and all chunks
would accommodate 3354832851 rows.
/** This is the intermediate buffer where data needed to fill the
INFORMATION SCHEMA tables is fetched and later retrieved by the C++
code in handler/i_s.cc. */
static trx_i_s_cache_t trx_i_s_cache_static;
/** This is the intermediate buffer where data needed to fill the
INFORMATION SCHEMA tables is fetched and later retrieved by the C++
code in handler/i_s.cc. */
trx_i_s_cache_t* trx_i_s_cache = &trx_i_s_cache_static;
/** Hash storage for strings */
struct ha_storage_t {
mem_heap_t* heap; /*!< memory heap from which memory is
allocated */
hash_table_t* hash; /*!< hash table used to avoid
duplicates */
};