prometheus tsdb compact_retentioncompact and retention compa

compact and retention

compact

为什么要compact

正如我们在第 4 部分中看到的，对数据的任何删除都作为墓碑存储在一个单独的文件中，而数据仍然保留在磁盘上。因此，当墓碑触及系列的某些百分比以上时，我们需要从磁盘中删除该数据。
有了足够低的流失率，相邻块（写入时间）中索引中的大部分数据将是相同的。因此，通过压缩（合并）这些相邻块，我们可以对索引的大部分进行去重，从而节省磁盘空间。
当查询命中 >1 个块时，我们必须合并从各个块获得的结果，这可能会带来一些开销。通过合并相邻块，我们可以防止这种开销。
如果存在重叠块（重叠写入时间），则查询它们需要对块之间的样本进行重复数据删除，这比仅连接来自不同块的块要昂贵得多。合并这些重叠块避免了重复数据删除的需要。

db在开启的时候make（chan struct{}, 1）的compactc。

触发条件：

每一分钟定时
每一个dbappender commit的时候检查一下。

Compact内部

检查head能否compact，如果能，将当前mmap的head块生成一个block。(head有三小时范围)
compact已有的blocks

// compactable returns whether the head has a compactable range.
// The head has a compactable range when the head time range is 1.5 times the chunk range.
// The 0.5 acts as a buffer of the appendable window.
func (h *Head) compactable() bool {
  return h.MaxTime()-h.MinTime() > h.chunkRange.Load()/2*3
}

compactblocks

plan，生成要压缩的源块的dir
compact

Plan

plan获取所有block的metadata（mint， maxt）。

生成需要压缩的block如下：

有重叠的block。
去掉最新的block，因为最新的block刚刚通过wal生成

一个ranges，[2h 6h 18h 54h 162h 486h], 通过splitbyrange将不同的block划分到对应的bucket里头。从6h开始spiltbyrange。

去掉最新的block，因为最新的block刚刚通过wal生成。
```
// No overlapping blocks, do compaction the usual way.
  // We do not include a recently created block with max(minTime), so the block which was just created from WAL.
  // This gives users a window of a full block size to piece-wise backup new data without having to care about data overlap.
  dms = dms[:len(dms)-1]
```
对每一个bucket，如果当前最新的bucket（其实往往是第一个bucket）没有横跨整个range，就不会切分。

其实意思是对6h的range，希望至少有3个bucket再去做整合。

mint := p[0].meta.MinTime
maxt := p[len(p)-1].meta.MaxTime
// Pick the range of blocks if it spans the full range (potentially with gaps)
// or is before the most recent block.
// This ensures we don't compact blocks prematurely when another one of the same
// size still fits in the range.
if (maxt-mint == iv || maxt <= highTime) && len(p) > 1 {
   return p
}

// splitByRange splits the directories by the time range. The range sequence starts at 0.
//
// For example, if we have blocks [0-10, 10-20, 50-60, 90-100] and the split range tr is 30
// it returns [0-10, 10-20], [50-60], [90-100].
func splitByRange(ds []dirMeta, tr int64) [][]dirMeta {

例子：

对于以下，剔除4。生成两个range为60的bucket。

其中第一个bucket为1， 2。第二个bucket为3。

然后第二个会生成一个bucket，为1，2。(由于1，2是最新的bucket，并且块没有横跨range，所以不会compact)。

`Block for the next parent range appeared, and we have a gap with size 20 between second and third block.
We will not get this missed gap anymore and we should compact just these two.`: {
   metas: []dirMeta{
      metaRange("1", 0, 20, nil),
      metaRange("2", 20, 40, nil),
      metaRange("3", 60, 80, nil),
      metaRange("4", 80, 100, nil),
   },
   expected: []string{"1", "2"},
},
`Block for the next parent range appeared with gap with size 20. Nothing will happen in the first one
    anymore but we ignore fresh one still, so no compaction`: {
      metas: []dirMeta{
        metaRange("1", 0, 20, nil),
        metaRange("2", 20, 40, nil),
        metaRange("3", 60, 80, nil),
      },
      expected: nil,
    },

当块足够大，大于54h的时候，并且此时tombstone的数量大于series数量的5%。

或者块小于54h，但是tombstone的数量大于等于series的时候。

// Compact any blocks with big enough time range that have >5% tombstones.
  for i := len(dms) - 1; i >= 0; i-- {
    meta := dms[i].meta
    if meta.MaxTime-meta.MinTime < c.ranges[len(c.ranges)/2] {
      // If the block is entirely deleted, then we don't care about the block being big enough.
      // TODO: This is assuming single tombstone is for distinct series, which might be no true.
      if meta.Stats.NumTombstones > 0 && meta.Stats.NumTombstones >= meta.Stats.NumSeries {
        return []string{dms[i].dir}, nil
      }
      break
    }
    if float64(meta.Stats.NumTombstones)/float64(meta.Stats.NumSeries+1) > 0.05 {
      return []string{dms[i].dir}, nil
    }
  }

retention

tsdb会清除一些block。

首先将block按时间戳排序，最新的在前面。

// Sort the blocks by time - newest to oldest (largest to smallest timestamp).
  // This ensures that the retentions will remove the oldest  blocks.
  sort.Slice(blocks, func(i, j int) bool {
    return blocks[i].Meta().MaxTime > blocks[j].Meta().MaxTime
  })

compact后的旧block
太旧的block，如果block与最新的block的数据差值在15d以上，就会删除。
从头到尾遍历block，累加大小。当超过最大大小的时候，删除block。

原子的移除文件夹。

由于removeall不是原子的，因此我们通过先rename，后remove的方式移除block 。

// Replace atomically to avoid partial block when process would crash during deletion.
tmpToDelete := filepath.Join(db.dir, fmt.Sprintf("%s%s", ulid, tmpForDeletionBlockDirSuffix))
if err := fileutil.Replace(toDelete, tmpToDelete); err != nil {
   return errors.Wrapf(err, "replace of obsolete block for deletion %s", ulid)
}
if err := os.RemoveAll(tmpToDelete); err != nil {
   return errors.Wrapf(err, "delete obsolete block %s", ulid)
}