关于视图i_s.innodb_trx的缓存

34 阅读11分钟

问题

最近遇到问题,在5.7.43中查询i_s.innodb_trx的时候出现了输出数据和当前信息不符的情况,且检查show engine 出现大量的如下RW锁信息,

--Thread 140086215583488 has waited at trx0i_s.cc line 1495 for 0  seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x3ca6f98 created in file trx0i_s.cc line 1417
a writer (thread id 140086215583488) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file trx0i_s.cc line 1465
Last time write locked in file /var/lib/pb2/sb_1-11862956-1687353311.91/mysql-5.7.43/storage/innobase/trx/trx0i_s.cc line 1495

检查数据库发现大量访问innodb_trx/innodb_locks/innodb_lock_waits的语句。

i_s.innodb_trx的缓存

当我们访问i_s.innodb_trx的时候,实际上是首先构建一个叫做trx_i_s_cache的cache缓存,这个缓存是全局的一个缓存,大概如下

trx_i_s_cache trx_i_s_cache_t
  -- innodb_trx i_s_table_cache_t         innodb_trx的内存
     -- i_s_mem_chunk_t内存chunk39个      是存储cache行的主要空间,innodb_trx的每行都存在里面
        i_s_mem_chunk_t	chunks[0]          
        -- offset读取的offset
        -- rows_allocd本chunk分配的行
        -- void*	base 内存所在位置   
        i_s_mem_chunk_t	chunks[1]
        ...
        i_s_mem_chunk_t	chunks[MEM_CHUNKS_IN_TABLE_CACHE-1]
     -- rows_used使用中的行
     -- rows_allocd分配的行
     -- row_size 行的大小       
  -- innodb_locks i_s_table_cache_t       如上,innodb_locks的内存   
  -- innodb_lock_waits i_s_table_cache_t  如上,innodb_lock_waits的内存
  -- locks_hash hash_table_t*             10000个cell的hash查找表,应用于查找重复的冲突行
  -- storage ha_storage_t*  <------------------------------|  内存
     -- heap mem_heap_t*   ha_storage_t地址地址所在内存 ----|  主要用于存储innodb_trx中的query,innodb_locks中的lock_data等
        -- ha_storage_t    --->ha_storage_node_t --->ha_storage_node_t --->ha_storage_node_t
        -- 1024字节内存
     -- hash hash_table_t* 2048的cell的查找表                 主要用于查找ha_storage_node_t结构          
        cell[0]    --->*ha_storage_node_t
        ...
        cell[2047] --->*ha_storage_node_t
  -- mem_allocd                           分配的内存大小
  -- is_truncated                         是否被截断了

这里的我们发现这个缓存包含了3个i_s_table_cache_t类型的结构,分别用于填充innodb_locks/innodb_lock_waits/innodb_locks 信息,其中每个结构都包含了39个chunk,用于实际的存储每行数据,第一个chunk可以存储1024行的数据,当一个chunk不够的时候,就会扩充chunk,并且每次空充都会增加一部分也就是old+1/2old,这样实际上可以保存非常多的行,下面是代码中的一段解释,

The rows of a table cache are stored in a set of chunks. When a new
row is added a new chunk is allocated if necessary. Assuming that the
first one is 1024 rows (TABLE_CACHE_INITIAL_ROWSNUM) and each
subsequent is N/2 where N is the number of rows we have allocated till
now, then 39th chunk would accommodate 1677416425 rows and all chunks would accommodate 3354832851 rows.

而storage主要用于存储一些特别的信息比如innodb_trx中的query,innodb_locks中的lock_data等。在5.7中访问innodb_trx/innodb_lock_waits/innodb_locks中的任何一个视图都会构建整个cache,当然在构建的时候需要对缓存上RW的X锁,并且会对lock_sys和trx_sys上锁,然后扫描trx_sys读写链表和trx_sys的只读链表进行填充缓存,每次重新填充cache缓存的时候会完全清理上一次填充的全部信息,当然为了避免代价过高,如果如果两次填充cache缓存小于0.1S将不会填充,直接访问缓存即可。当构建完成后会释放各种锁。 然后进入读取截断,读取截断需要对缓存上RW的S锁,对不同的视图访问不同的结构,也就是cache中的3个i_s_table_cache_t类型的结构中的一个。 当我们写的语句需要大量循环不断的访问innodb_trx/innodb_lock_waits/innodb_locks 且当前事务和锁堵塞信息很多的时候,这部分代价也是可观的。这里谈到的缓存上的RW就是前面看到的,

X-lock (wait_ex) on RW-latch at 0x3ca6f98 created in file trx0i_s.cc line 1417

到了8.0过后访问innodb_trx依旧如此,并且也会填充整个cache,但是当我们访问视图p_s.data_lock_waits/p_s.data_locks的时候已经不依赖这个cache,但从实际经验来看8.0也应该尽量避免大量平凡访问这部分视图。

简单测试和总结

为了证明这个缓存的存在,我们可以debug更改一下两次收集的时间让其小于0.1s,然后比如show processlist来观察如下, 40ea2453afacdb28b1da5a836b3ae041.png转存失败,建议直接上传图片文件

我们这里发现id 4的session在show processlist并不存在,这里通过gdb修改强制不够构建cache,使用的是老的cache,而这个id 4的session实际已经退出了。

因此对于提到的问题,我们应该立即停掉大量访问innodb_trx/innodb_lock_waits/innodb_locks语句的应用,保证cache能够正常更新,然后访问innodb_trx正常了。当然故障案例中为什么cache不能正常更新,且没有完全堵塞,而是能够访问到旧的cache的数据这点有点费解。

代码部分

启动的时候重新初始化

trx_i_s_cache_init(trx_i_s_cache)
  ->cache->rw_lock = static_cast<rw_lock_t*>(ut_malloc_nokey(sizeof(*cache->rw_lock)))
    首先分配cache上的读写锁
  ->rw_lock_create(trx_i_s_cache_lock_key, cache->rw_lock,SYNC_TRX_I_S_RWLOCK);
    创建锁
  ->cache->last_read = 0
  ->mutex_create(LATCH_ID_CACHE_LAST_READ, &cache->last_read_mutex)
    建立保护last_read的mutex
  ->table_cache_init(&cache->innodb_trx, sizeof(i_s_trx_row_t))
    i_s_trx_row_t为innodb_trx行相对的一个结构体,传入后为row_size
    ->table_cache->rows_used = 0;table_cache->rows_allocd = 0;table_cache->row_size = row_size;
      进行初始化
    ->for (i = 0; i < MEM_CHUNKS_IN_TABLE_CACHE; i++)
      MEM_CHUNKS_IN_TABLE_CACHE 为39,初始化39个chunk为NULL
      ->table_cache->chunks[i].base = NULL
  ->table_cache_init(&cache->innodb_locks, sizeof(i_s_locks_row_t));
    i_s_locks_row_t为innodb_locks行对应额结构体,初始化39个chunk为NULL
  ->table_cache_init(&cache->innodb_lock_waits,sizeof(i_s_lock_waits_row_t)); 
    i_s_lock_waits_row_t为innodb_lock_waits行对应额结构体,初始化39个chunk为NULL
  ->cache->locks_hash = hash_create(LOCKS_HASH_CELLS_NUM)
    建立包含10000个cell的hash查找表
  ->cache->storage = ha_storage_create(CACHE_STORAGE_INITIAL_SIZE,CACHE_STORAGE_HASH_CELLS)  
    CACHE_STORAGE_INITIAL_SIZE	1024/CACHE_STORAGE_HASH_CELLS	2048
    ->initial_heap_bytes = HA_STORAGE_DEFAULT_HEAP_BYTES
    ->initial_hash_cells = HA_STORAGE_DEFAULT_HASH_CELLS
    ->heap = mem_heap_create(sizeof(ha_storage_t)+ initial_heap_bytes)
      这里应该是ha_storage_t结构所在地址然后增加了1024字节内存
    ->storage = (ha_storage_t*) mem_heap_alloc(heap, sizeof(ha_storage_t));
      从headp中分配内存用于存储ha_storage_t的内存
    ->storage->hash = hash_create(initial_hash_cells)
      分配2048的cell的查找表
    ->return(storage)
      返回这个storage的地址
  ->cache->mem_allocd = 0 
    分配的内存为0
  ->cache->is_truncated = false
    是否截断

访问视图加载
trx_i_s_common_fill_table
  ->cache = trx_i_s_cache 
    获取全局trx_i_s_cache
  ->table_name = tables->schema_table_name
    需要查询的是哪一个视图 一共包含3个  innodb_trx  innodb_locks innodb_lock_waits
  ->trx_i_s_cache_start_write(cache)
    获取trx_i_s_cache上的rw_lock X
  ->trx_i_s_possibly_fetch_data_into_cache
    ->if (!can_cache_be_updated(cache))
      是否可以更改cache
      ->if (now - cache->last_read > CACHE_MIN_IDLE_TIME_US)
        是否每次填充超过了0.1s
        ->return(TRUE)
          如果是就返回true
      ->return(FALSE)
          否则不需要更新,也就是两次获取少于0.1不更新
    ->lock_mutex_enter
      ->mutex_enter(&lock_sys->mutex)
        加锁lock_sys锁
    ->trx_sys_mutex_enter
      ->mutex_enter(&trx_sys->mutex)
        加锁trx_sys锁
    ->fetch_data_into_cache(cache)
      ->trx_i_s_cache_clear(cache)
        ->cache->innodb_trx.rows_used = 0;cache->innodb_locks.rows_used = 0;cache->innodb_lock_waits.rows_used = 0;
         清空3个表记录的行数
        ->hash_table_clear(cache->locks_hash)
          清理lock_hash hash查找表的内存
          ->memset(table->array, 0x0,table->n_cells * sizeof(*table->array))
            清理locks_hash hash查找表中的内存
        ->ha_storage_empty(&cache->storage)
          清理storage中的内存
          ->temp_storage.heap = (*storage)->heap;temp_storage.hash = (*storage)->hash;
           建立temp_storage指向内存
          ->hash_table_clear(temp_storage.hash)
           清理storage的hash查找表
          ->mem_heap_empty(temp_storage.heap)
           清空hash和堆内存
          ->*storage = (ha_storage_t*)mem_heap_alloc(temp_storage.heap,sizeof(ha_storage_t))
           重新storage初始化内存
          ->(*storage)->heap = temp_storage.heap;(*storage)->hash = temp_storage.hash;
           恢复指向
      ->fetch_data_into_cache_low(cache, true, &trx_sys->rw_trx_list)
        首先获取读写事务的信息
        ->rw_trx_list = trx_list == &trx_sys->rw_trx_list
          是否是读写链表,因为这个函数读写链表和读链表都要调用
        ->for (trx = UT_LIST_GET_FIRST(*trx_list);trx != NULL;trx =(rw_trx_list? UT_LIST_GET_NEXT(trx_list, trx): UT_LIST_GET_NEXT(mysql_trx_list, trx)))
          循环这个读写链表中的事务
          ->i_s_trx_row_t*		trx_row;i_s_locks_row_t*	requested_lock_row;
            定义innodb_trx和innodb_locks的行指针
          ->if (!trx_is_started(trx)|| (!rw_trx_list && trx->id != 0 && !trx->read_only)
            A:是否事务已经开启 如果处于TRX_STATE_NOT_STARTED 不会打印
            B:如果是只读链表 并且 trx->id不为0 并且 不是read only事务 不打印,这应该是只读链表打印的判断 
          ->ut_ad(trx->in_rw_trx_list == rw_trx_list); 
            本处断言 打印的读写链表 和 只读链表
          ->if (!add_trx_relevant_locks_to_cache(cache, trx,&requested_lock_row))   
            首先填充innodb_locks需要的信息         
            ->if (trx->lock.que_state == TRX_QUE_LOCK_WAIT)
              如果事务处于TRX_QUE_LOCK_WAIT状态下
              ->wait_lock_heap_no = wait_lock_get_heap_no(trx->lock.wait_lock)
                获取等待的锁的heap no
              ->*requested_lock_row = add_lock_to_cache(cache, trx->lock.wait_lock, wait_lock_heap_no)
                ->dst_row = (i_s_locks_row_t*) table_cache_create_empty_row(&cache->innodb_locks, cache)
                  建立一个新的i_s_locks_row_t结构  innodb_locks
                ->if (!fill_locks_row(dst_row, lock, heap_no, cache))
                   其他填充
                  ->if (!fill_lock_data(&row->lock_data, lock, heap_no, cache)) 
                    row->lock_data填充
                   ->ha_storage_put_memlim
                ->HASH_INSERT
                  插入到locks_hash中    
              ->for (curr_lock = lock_queue_iterator_get_prev(&iter);curr_lock != NULL;curr_lock = lock_queue_iterator_get_prev(&iter))
                迭代获取blocking lock的信息
                ->add_lock_wait_to_cache
                  ->dst_row = (i_s_lock_waits_row_t*)table_cache_create_empty_row(&cache->innodb_lock_waits,cache)
                    从chunk中分配建立一个新的i_s_lock_waits_row_t结构  innodb_lock_waits
                  ->fill_lock_waits_row(dst_row, requested_lock_row, blocking_lock_row)
                    填充
                ....
            ->else 
              *requested_lock_row = NULL;
              
          ->trx_row = reinterpret_cast<i_s_trx_row_t*>( table_cache_create_empty_row(&cache->innodb_trx, cache))       
            从chunk中分配一个i_s_trx_row_t  innodb_trx
          ->if (!fill_trx_row(trx_row, trx, requested_lock_row, cache)) 
            ...
            ->stmt_len = innobase_get_stmt_safe(trx->mysql_thd, query, sizeof(query))
              trx中的length不为0,其他填充
              ->if (stmt_len > 0)
                ->ha_storage_put_memlim
                  分配data内存
      ->fetch_data_into_cache_low(cache, false, &trx_sys->mysql_trx_list)
        获取read-only事务信息,如上只是不填充innodb_locks中的信息
        但是读取只读事务
      ->cache->is_truncated = FALSE;
  //到这里内存信息已经加载
  ->trx_i_s_cache_start_read(cache); 
    加S 锁  
  ->if (innobase_strcasecmp(table_name, "innodb_trx") == 0)
    如果需要的视图是innodb_trx,读取cache获取
    ->fill_innodb_trx_from_cache(cache, thd, tables->table) 
  ->else if (innobase_strcasecmp(table_name, "innodb_locks") == 0)
    如果需要的视图是innodb_locks,读取cache获取
    ->fill_innodb_locks_from_cache(cache, thd, tables->table)      
  ->else if (innobase_strcasecmp(table_name, "innodb_lock_waits") == 0)
    如果需要的视图是innodb_lock_waits,读取cache获取
    ->fill_innodb_lock_waits_from_cache   

/** This represents one table's cache. */
struct i_s_table_cache_t {
	ulint		rows_used;	/*!< number of used rows */
	ulint		rows_allocd;	/*!< number of allocated rows */
	ulint		row_size;	/*!< size of a single row */
	i_s_mem_chunk_t	chunks[MEM_CHUNKS_IN_TABLE_CACHE]; /*!< array of
					memory chunks that stores the
					rows */
};

/** This structure describes the intermediate buffer */
struct trx_i_s_cache_t { 
	rw_lock_t*	rw_lock;	/*!< read-write lock protecting
					the rest of this structure */
	uintmax_t	last_read;	/*!< last time the cache was read;
					measured in microseconds since
					epoch */
	ib_mutex_t		last_read_mutex;/*!< mutex protecting the
					last_read member - it is updated
					inside a shared lock of the
					rw_lock member */
	i_s_table_cache_t innodb_trx;	/*!< innodb_trx table */
	i_s_table_cache_t innodb_locks;	/*!< innodb_locks table */
	i_s_table_cache_t innodb_lock_waits;/*!< innodb_lock_waits table */
/** the hash table size is LOCKS_HASH_CELLS_NUM * sizeof(void*) bytes */
#define LOCKS_HASH_CELLS_NUM		10000
	hash_table_t*	locks_hash;	/*!< hash table used to eliminate
					duplicate entries in the
					innodb_locks table */
/** Initial size of the cache storage */
#define CACHE_STORAGE_INITIAL_SIZE	1024
/** Number of hash cells in the cache storage */
#define CACHE_STORAGE_HASH_CELLS	2048
	ha_storage_t*	storage;	/*!< storage for external volatile
					data that may become unavailable
					when we release
					lock_sys->mutex or trx_sys->mutex */
	ulint		mem_allocd;	/*!< the amount of memory
					allocated with mem_alloc*() */
	ibool		is_truncated;	/*!< this is TRUE if the memory
					limit was hit and thus the data
					in the cache is truncated */
};


/** Memory for each table in the intermediate buffer is allocated in
separate chunks. These chunks are considered to be concatenated to
represent one flat array of rows. */
struct i_s_mem_chunk_t {
	ulint	offset;		/*!< offset, in number of rows */
	ulint	rows_allocd;	/*!< the size of this chunk, in number
				of rows */
	void*	base;		/*!< start of the chunk */
};
39个chunk,初始化的内存为1024行,每个chunk增加会增加
#define TABLE_CACHE_INITIAL_ROWSNUM	1024

也就是如下的说明
The rows of a table cache are stored in a set of chunks. When a new
row is added a new chunk is allocated if necessary. Assuming that the
first one is 1024 rows (TABLE_CACHE_INITIAL_ROWSNUM) and each
subsequent is N/2 where N is the number of rows we have allocated till
now, then 39th chunk would accommodate 1677416425 rows and all chunks
would accommodate 3354832851 rows.

            
            
            


/** This is the intermediate buffer where data needed to fill the
INFORMATION SCHEMA tables is fetched and later retrieved by the C++
code in handler/i_s.cc. */
static trx_i_s_cache_t	trx_i_s_cache_static;
/** This is the intermediate buffer where data needed to fill the
INFORMATION SCHEMA tables is fetched and later retrieved by the C++
code in handler/i_s.cc. */
trx_i_s_cache_t*	trx_i_s_cache = &trx_i_s_cache_static;



/** Hash storage for strings */
struct ha_storage_t {
	mem_heap_t*	heap;	/*!< memory heap from which memory is
				allocated */
	hash_table_t*	hash;	/*!< hash table used to avoid
				duplicates */
};