Paimon源码解读 -- Compaction-5.CompactManager

60 阅读4分钟

前言

本文重点介绍CompactManager的机制

一.CompactManager接口

这是个接口,其继承关系如下

image.png

二.MergeTreeCompactManager类

1.代码解析

CompactManager (接口)
    ↑
CompactFutureManager (抽象类)
    ↑
MergeTreeCompactManager (实现类)

(0) 核心变量

private final ExecutorService executor;           // 压缩任务执行线程池
private final Levels levels;                      // LSM Tree 层级管理器
private final CompactStrategy strategy;           // 压缩策略(决定选择哪些文件)
private final Comparator<InternalRow> keyComparator;  // Key 比较器
private final long compactionFileSize;            // 压缩目标文件大小
private final int numSortedRunStopTrigger;       // 触发等待的 SortedRun 数量阈值
private final CompactRewriter rewriter;          // 压缩重写器(执行实际的文件合并)
private final CompactionMetrics.Reporter metricsReporter;  // 指标上报器
private final DeletionVectorsMaintainer dvMaintainer;      // 删除向量维护器

(1) shouldWaitForLatestCompaction()shouldWaitForPreparingCheckpoint()

// 判断是否需要等待压缩任务完成
@Override
public boolean shouldWaitForLatestCompaction() {
    // 当 Sorted Run 数量超过阈值 numSortedRunStopTrigger 时,强制等待;防止L0层文件堆积
    return levels.numberOfSortedRuns() > numSortedRunStopTrigger;
}
// 判断是否需要在CP准备阶段等待压缩任务完成
@Override
public boolean shouldWaitForPreparingCheckpoint() {
    // cast to long to avoid Numeric overflow
    // 当 Sorted Run 数量超过阈值 numSortedRunStopTrigger + 1时,强制CP等待;
    return levels.numberOfSortedRuns() > (long) numSortedRunStopTrigger + 1;
}

(2) addNewFile()

/**
 * 将DataFileMeta加入到L0层文件中,以便后续压缩
 * 由MergeTreeWriter.flushWriteBuffer()调用
 * @param file
 */
@Override
public void addNewFile(DataFileMeta file) {
    levels.addLevel0File(file);
    MetricUtils.safeCall(this::reportLevel0FileCount, LOG);
}

(3) allFiles()

// 获取所有level层级的文件
@Override
public List<DataFileMeta> allFiles() {
    return levels.allFiles();
}

(4) levels()

@VisibleForTesting
public Levels levels() {
    return levels; // 获取Levels对象,这里面有所有层级的所有文件及其他信息
}

(5) triggerCompact() -- 核心

核心逻辑:

  1. 获取 SortedRuns: 调用 levels.levelSortedRuns() 获取所有层级的文件
    • Level 0: 每个文件是一个 SortedRun(可能有重叠)
    • Level 1+: 每层是一个 SortedRun(无重叠,有序)
  2. 选择压缩单元 :CompactUnit:
    • 全量压缩CompactStrategy.pickFullCompaction() - 选择所有文件
    • 普通压缩strategy.pick() - 根据策略选择
  3. dropDelete 判断:
    • 如果输出层级 ≥ 最高非空层级,可以安全删除 DELETE 标记
    • 或者有删除向量维护器时也可以删除
  4. 触发压缩:submitCompaction(),最后底层交给MergeTreeCompactTask执行,详情请看Paimon源码解读 -- Compaction-1.MergeTreeCompactTask
/**
 * 触发compact的核心代码
 * @param fullCompaction 是否需要全量压缩
 */
@Override
public void triggerCompaction(boolean fullCompaction) {
    // 这里的optionalUnit是需要进行压缩的文件
    Optional<CompactUnit> optionalUnit;
    // 获取所有level的全部Sorted Run文件,加入到List中
    List<LevelSortedRun> runs = levels.levelSortedRuns();
    // 1.计算要压缩的CompactUnit
    // CASE-1: 全量压缩
    if (fullCompaction) {
        Preconditions.checkState(
                taskFuture == null,
                "A compaction task is still running while the user "
                        + "forces a new compaction. This is unexpected.");
        if (LOG.isDebugEnabled()) {
            LOG.debug(
                    "Trigger forced full compaction. Picking from the following runs\n{}",
                    runs);
        }
        // 全量压缩会选择所有层级的文件进行完全压缩
        optionalUnit = CompactStrategy.pickFullCompaction(levels.numberOfLevels(), runs);
    }
    // CASE-2: 普通压缩
    else {
        if (taskFuture != null) {
            return;
        }
        if (LOG.isDebugEnabled()) {
            LOG.debug("Trigger normal compaction. Picking from the following runs\n{}", runs);
        }
        // 普通压缩,使用CompactStrategy策略计算CompaactUnit
        optionalUnit =
                strategy.pick(levels.numberOfLevels(), runs) // 调strategy.pick()去筛选文件
                        .filter(unit -> unit.files().size() > 0) // 必须要有文件
                        .filter(
                                unit ->
                                        unit.files().size() > 1 // 要么多个文件
                                                || unit.files().get(0).level() // 要么单个文件但是需要改变层级
                                                        != unit.outputLevel());
    }

    // 2.对计算好的CompactUnit调submitCompaction()进行压缩,最后其实也是MergeTreeCompactTask执行的压缩
    optionalUnit.ifPresent(
            unit -> {
                /*
                 * As long as there is no older data, We can drop the deletion.
                 * If the output level is 0, there may be older data not involved in compaction.
                 * If the output level is bigger than 0, as long as there is no older data in
                 * the current levels, the output is the oldest, so we can drop the deletion.
                 * See CompactStrategy.pick.
                 */
                boolean dropDelete =
                        unit.outputLevel() != 0
                                && (unit.outputLevel() >= levels.nonEmptyHighestLevel()
                                        || dvMaintainer != null);

                if (LOG.isDebugEnabled()) {
                    LOG.debug(
                            "Submit compaction with files (name, level, size): "
                                    + levels.levelSortedRuns().stream()
                                            .flatMap(lsr -> lsr.run().files().stream())
                                            .map(
                                                    file ->
                                                            String.format(
                                                                    "(%s, %d, %d)",
                                                                    file.fileName(),
                                                                    file.level(),
                                                                    file.fileSize()))
                                            .collect(Collectors.joining(", ")));
                }
                submitCompaction(unit, dropDelete);
            });
}

(6) 调用的submitCompaction() -- 核心

核心作用:创建压缩任务MergeTreeCompactTask,将其交给线程池去执行

private void submitCompaction(CompactUnit unit, boolean dropDelete) {
    // 创建文件删除的Supplier
    Supplier<CompactDeletionFile> compactDfSupplier = () -> null;
    if (dvMaintainer != null) {
        compactDfSupplier =
                lazyGenDeletionFile
                        ? () -> CompactDeletionFile.lazyGeneration(dvMaintainer)
                        : () -> CompactDeletionFile.generateFiles(dvMaintainer);
    }
    // 创建压缩任务MergeTreeCompactTask
    MergeTreeCompactTask task =
            new MergeTreeCompactTask(
                    keyComparator,
                    compactionFileSize,
                    rewriter,
                    unit,
                    dropDelete,
                    levels.maxLevel(),
                    metricsReporter,
                    compactDfSupplier);
    if (LOG.isDebugEnabled()) {
        LOG.debug(
                "Pick these files (name, level, size) for compaction: {}",
                unit.files().stream()
                        .map(
                                file ->
                                        String.format(
                                                "(%s, %d, %d)",
                                                file.fileName(), file.level(), file.fileSize()))
                        .collect(Collectors.joining(", ")));
    }
    // 提交到线程池去执行该压缩任务MergeTreeCompactTask
    taskFuture = executor.submit(task);
    if (metricsReporter != null) {
        metricsReporter.increaseCompactionsQueuedCount();
    }
}

(7) getCompactionResult()

/**
 * 当前compact任务完成,则会调该方法,去更新levels中的文件结构,移除旧文件,添加新文件
 * @param blocking
 * @return
 * @throws ExecutionException
 * @throws InterruptedException
 */
@Override
public Optional<CompactResult> getCompactionResult(boolean blocking)
        throws ExecutionException, InterruptedException {
    // 调父类的innerGetCompactionResult()获取压缩结果
    Optional<CompactResult> result = innerGetCompactionResult(blocking);
    // 根据压缩结果去更新levels中的文件结构,移除旧文件,添加新文件
    result.ifPresent(
            r -> {
                if (LOG.isDebugEnabled()) {
                    LOG.debug(
                            "Update levels in compact manager with these changes:\nBefore:\n{}\nAfter:\n{}",
                            r.before(),
                            r.after());
                }
                // 更新levels中的文件结构
                levels.update(r.before(), r.after());
                MetricUtils.safeCall(this::reportLevel0FileCount, LOG);
                if (LOG.isDebugEnabled()) {
                    LOG.debug(
                            "Levels in compact manager updated. Current runs are\n{}",
                            levels.levelSortedRuns());
                }
            });
    return result;
}

2.流程总结

sequenceDiagram
    participant Writer as MergeTreeWriter
    participant Manager as MergeTreeCompactManager
    participant Levels as Levels
    participant Executor as ExecutorService
    participant Task as MergeTreeCompactTask

    Writer->>Manager: addNewFile(file)
    Manager->>Levels: addLevel0File(file)
    
    Writer->>Manager: triggerCompaction(fullCompaction)
    Manager->>Levels: levelSortedRuns()
    Levels-->>Manager: List<LevelSortedRun>
    
    alt fullCompaction=true
        Manager->>Manager: pickFullCompaction() - 所有文件
    else fullCompaction=false
        Manager->>Manager: strategy.pick() - 部分文件
    end
    
    Manager->>Manager: submitCompaction(unit, dropDelete)
    Manager->>Task: new MergeTreeCompactTask(...)
    Manager->>Executor: submit(task)
    Executor-->>Manager: taskFuture
    
    Note over Executor,Task: 异步执行压缩
    Task->>Task: doCompact()
    Task->>Task: upgrade() / rewrite()
    
    Writer->>Manager: getCompactionResult(blocking)
    Manager->>Manager: taskFuture.get()
    Manager->>Levels: update(before, after)
    Manager-->>Writer: CompactResult

三.NoopCompactManager类

该类被创建的几种情况如下

  1. 主键表的write-only = true时
  2. Append表的write-only = true时
  3. Append表的bucket = -1时
/** A {@link CompactManager} which never compacts. */
public class NoopCompactManager implements CompactManager {

    public NoopCompactManager() {}
    // 永远不需要等待压缩
    @Override
    public boolean shouldWaitForLatestCompaction() {
        return false;
    }
    
    // checkpoint前也不需要等待
    @Override
    public boolean shouldWaitForPreparingCheckpoint() {
        return false;
    }
    
    // 空实现,不跟踪文件
    @Override
    public void addNewFile(DataFileMeta file) {}
    
    // 返回空列表
    @Override
    public List<DataFileMeta> allFiles() {
        return Collections.emptyList();
    }
    
    // 检查是否存在参数冲突,这里是不允许执行任何compact操作的
    @Override
    public void triggerCompaction(boolean fullCompaction) {
        Preconditions.checkArgument(
                !fullCompaction,
                "NoopCompactManager does not support user triggered compaction.\n"
                        + "If you really need a guaranteed compaction, please set "
                        + CoreOptions.WRITE_ONLY.key()
                        + " property of this table to false.");
    }
    
    // 返回空结果
    @Override
    public Optional<CompactResult> getCompactionResult(boolean blocking)
            throws ExecutionException, InterruptedException {
        return Optional.empty();
    }

    @Override
    public void cancelCompaction() {}

    @Override
    public boolean isCompacting() {
        return false;
    }

    @Override
    public void close() throws IOException {}
}