prometheus tsdb head_chunk

374 阅读3分钟

head_chunk

一个series采样了120个sample,或者是大于两小时的时候,会将series所生成的内存的chunk去写入磁盘,然后通过mmap的方式访问磁盘的chunk。

120是在gorilla论文的工程实践,压缩效果比较好。

  // Based on Gorilla white papers this offers near-optimal compression ratio
  // so anything bigger that this has diminishing returns and increases
  // the time range within which we have to decompress all samples.
  const samplesPerChunk = 120

实际的话,当一个series,samples数量到达1 / 4 的时候,就会去预测这么一个结束时间next_at,如果当前的t大于next_at就会写入mmap文件中。

(这种预测的方式会尝试让样本均匀分布)

// If we reach 25% of a chunk's desired sample count, predict an end time
// for this chunk that will try to make samples equally distributed within
// the remaining chunks in the current chunk range.
// At latest it must happen at the timestamp set when the chunk was cut.

chunkdiskmapper

chunk有一个curfle,mmap了一个128m的文件。

然后我们有个buffer writer,大小为4m,包装了curfile。

写入chunkdiskmapper的chunk,先写入curfile,直到这个buffer到达4m缓存后再flush这个buffer。

其实flush buffer,并不代表数据不会丢,只是说明写入内核缓冲区了。

而curfile彻底的安全需要sync curfile。

// ChunkDiskMapper is for writing the Head block chunks to the disk
// and access chunks via mmapped file.
type ChunkDiskMapper struct {
   curFileNumBytes atomic.Int64 // Bytes written in current open file.
​
   /// Writer.
   dir             *os.File
   writeBufferSize int
​
   curFile         *os.File // File being written to.
   curFileSequence int      // Index of current open file being appended to.
   curFileMaxt     int64    // Used for the size retention.
​
   byteBuf      [MaxHeadChunkMetaSize]byte // Buffer used to write the header of the chunk.
   chkWriter    *bufio.Writer              // Writer for the current open file.
   crc32        hash.Hash
   writePathMtx sync.Mutex
​
   /// Reader.
   // The int key in the map is the file number on the disk.
   mmappedChunkFiles map[int]*mmappedChunkFile // Contains the m-mapped files for each chunk file mapped with its index.
   closers           map[int]io.Closer         // Closers for resources behind the byte slices.
   readPathMtx       sync.RWMutex              // Mutex used to protect the above 2 maps.
   pool              chunkenc.Pool             // This is used when fetching a chunk from the disk to allocate a chunk.
​
   // Writer and Reader.
   // We flush chunks to disk in batches. Hence, we store them in this buffer
   // from which chunks are served till they are flushed and are ready for m-mapping.
   chunkBuffer *chunkBuffer
​
   // Whether the maxt field is set for all mmapped chunk files tracked within the mmappedChunkFiles map.
   // This is done after iterating through all the chunks in those files using the IterateAllChunks method.
   fileMaxtSet bool
​
   closed bool
}

cut

cut将当前的curfile,sync到磁盘里头。

然后使用mmap新建一个curfile,大小128m。

writechunk

chunkdiskmapper通过writechunk把当前series加密的信息写入mmap中的文件。

返回ChunkDiskMapperRef,前32位代表磁盘上的chunk文件,后32位表示chunkenc的offset偏移。

// WriteChunk writes the chunk to the disk.
// The returned chunk ref is the reference from where the chunk encoding starts for the chunk.
func (cdm *ChunkDiskMapper) WriteChunk(seriesRef HeadSeriesRef, mint, maxt int64, chk chunkenc.Chunk) (chkRef ChunkDiskMapperRef, err error) {

在writechunk的时候,会判断curfile是否大于128m,如果大于就cut。

重新mmap一个curfile。

chunk中存储的series数据如下。

| seriedref <8 字节> | mint <8 字节,uint64> | maxt <8 字节,uint64> | 编码 <1 字节> | len | 数据 <字节> │ CRC32 <4 字节> │

chunk

chunkdiskmapper通过chunk函数从ChunkDiskMapperRef获取对应的chunkenc:

  • 通过ChunkDiskMapperRef,从chunkbuffer中获取。

  • 通过前32位的segmentindex从mmappedChunkFiles中获取到mmappedChunkFile(一个[]byte加maxt)。

    然后通过后32位获取的chunkstart,在这个[]byte中找到对应的数据,校验crc,然后重新copy返回。

    重新copy的话是因为mmapfile不受内存上的控制。(这正是我们为什么会有chunkbuffer的原因,因为chunkbuffer在系统内存里面,无需copy。)

    // Make a copy of the chunk data to prevent a panic occurring because the returned
      // chunk data slice references an mmap-ed file which could be closed after the
      // function returns but while the chunk is still in use.
      chkDataCopy := make([]byte, len(chkData))
      copy(chkDataCopy, chkData)
    

chunkbuffer

当前有一个chunkbuffer,表示还在buffer里头的chunk。

让用户可以通过ChunkDiskMapperRef去直接访问对应的chunk。

(通过ChunkDiskMapperRef % inBufferShards )找到对应的map。

通过分段锁的方式降低竞争。

这个chunkbuffer,会在flushbuffer的时候被clear。

const inBufferShards = 128 // 128 is a randomly chosen number.// chunkBuffer is a thread safe lookup table for chunks by their ref.
type chunkBuffer struct {
  inBufferChunks     [inBufferShards]map[ChunkDiskMapperRef]chunkenc.Chunk
  inBufferChunksMtxs [inBufferShards]sync.RWMutex
}

Truncate

truncate会遍历

mmappedChunkFile,对于那些maxt小于mint的文件。使用os.remove删除

// Truncate deletes the head chunk files which are strictly below the mint.
// mint should be in milliseconds.
func (cdm *ChunkDiskMapper) Truncate(mint int64) error {