前言
本文重点介绍CompactManager的机制
一.CompactManager接口
这是个接口,其继承关系如下
二.MergeTreeCompactManager类
1.代码解析
CompactManager (接口)
↑
CompactFutureManager (抽象类)
↑
MergeTreeCompactManager (实现类)
(0) 核心变量
private final ExecutorService executor; // 压缩任务执行线程池
private final Levels levels; // LSM Tree 层级管理器
private final CompactStrategy strategy; // 压缩策略(决定选择哪些文件)
private final Comparator<InternalRow> keyComparator; // Key 比较器
private final long compactionFileSize; // 压缩目标文件大小
private final int numSortedRunStopTrigger; // 触发等待的 SortedRun 数量阈值
private final CompactRewriter rewriter; // 压缩重写器(执行实际的文件合并)
private final CompactionMetrics.Reporter metricsReporter; // 指标上报器
private final DeletionVectorsMaintainer dvMaintainer; // 删除向量维护器
(1) shouldWaitForLatestCompaction()和shouldWaitForPreparingCheckpoint()
// 判断是否需要等待压缩任务完成
@Override
public boolean shouldWaitForLatestCompaction() {
// 当 Sorted Run 数量超过阈值 numSortedRunStopTrigger 时,强制等待;防止L0层文件堆积
return levels.numberOfSortedRuns() > numSortedRunStopTrigger;
}
// 判断是否需要在CP准备阶段等待压缩任务完成
@Override
public boolean shouldWaitForPreparingCheckpoint() {
// cast to long to avoid Numeric overflow
// 当 Sorted Run 数量超过阈值 numSortedRunStopTrigger + 1时,强制CP等待;
return levels.numberOfSortedRuns() > (long) numSortedRunStopTrigger + 1;
}
(2) addNewFile()
/**
* 将DataFileMeta加入到L0层文件中,以便后续压缩
* 由MergeTreeWriter.flushWriteBuffer()调用
* @param file
*/
@Override
public void addNewFile(DataFileMeta file) {
levels.addLevel0File(file);
MetricUtils.safeCall(this::reportLevel0FileCount, LOG);
}
(3) allFiles()
// 获取所有level层级的文件
@Override
public List<DataFileMeta> allFiles() {
return levels.allFiles();
}
(4) levels()
@VisibleForTesting
public Levels levels() {
return levels; // 获取Levels对象,这里面有所有层级的所有文件及其他信息
}
(5) triggerCompact() -- 核心
核心逻辑:
- 获取 SortedRuns: 调用
levels.levelSortedRuns()获取所有层级的文件- Level 0: 每个文件是一个 SortedRun(可能有重叠)
- Level 1+: 每层是一个 SortedRun(无重叠,有序)
- 选择压缩单元 :
CompactUnit:- 全量压缩:
CompactStrategy.pickFullCompaction()- 选择所有文件 - 普通压缩:
strategy.pick()- 根据策略选择
- 全量压缩:
- dropDelete 判断:
- 如果输出层级 ≥ 最高非空层级,可以安全删除 DELETE 标记
- 或者有删除向量维护器时也可以删除
- 触发压缩:
submitCompaction(),最后底层交给MergeTreeCompactTask执行,详情请看Paimon源码解读 -- Compaction-1.MergeTreeCompactTask
/**
* 触发compact的核心代码
* @param fullCompaction 是否需要全量压缩
*/
@Override
public void triggerCompaction(boolean fullCompaction) {
// 这里的optionalUnit是需要进行压缩的文件
Optional<CompactUnit> optionalUnit;
// 获取所有level的全部Sorted Run文件,加入到List中
List<LevelSortedRun> runs = levels.levelSortedRuns();
// 1.计算要压缩的CompactUnit
// CASE-1: 全量压缩
if (fullCompaction) {
Preconditions.checkState(
taskFuture == null,
"A compaction task is still running while the user "
+ "forces a new compaction. This is unexpected.");
if (LOG.isDebugEnabled()) {
LOG.debug(
"Trigger forced full compaction. Picking from the following runs\n{}",
runs);
}
// 全量压缩会选择所有层级的文件进行完全压缩
optionalUnit = CompactStrategy.pickFullCompaction(levels.numberOfLevels(), runs);
}
// CASE-2: 普通压缩
else {
if (taskFuture != null) {
return;
}
if (LOG.isDebugEnabled()) {
LOG.debug("Trigger normal compaction. Picking from the following runs\n{}", runs);
}
// 普通压缩,使用CompactStrategy策略计算CompaactUnit
optionalUnit =
strategy.pick(levels.numberOfLevels(), runs) // 调strategy.pick()去筛选文件
.filter(unit -> unit.files().size() > 0) // 必须要有文件
.filter(
unit ->
unit.files().size() > 1 // 要么多个文件
|| unit.files().get(0).level() // 要么单个文件但是需要改变层级
!= unit.outputLevel());
}
// 2.对计算好的CompactUnit调submitCompaction()进行压缩,最后其实也是MergeTreeCompactTask执行的压缩
optionalUnit.ifPresent(
unit -> {
/*
* As long as there is no older data, We can drop the deletion.
* If the output level is 0, there may be older data not involved in compaction.
* If the output level is bigger than 0, as long as there is no older data in
* the current levels, the output is the oldest, so we can drop the deletion.
* See CompactStrategy.pick.
*/
boolean dropDelete =
unit.outputLevel() != 0
&& (unit.outputLevel() >= levels.nonEmptyHighestLevel()
|| dvMaintainer != null);
if (LOG.isDebugEnabled()) {
LOG.debug(
"Submit compaction with files (name, level, size): "
+ levels.levelSortedRuns().stream()
.flatMap(lsr -> lsr.run().files().stream())
.map(
file ->
String.format(
"(%s, %d, %d)",
file.fileName(),
file.level(),
file.fileSize()))
.collect(Collectors.joining(", ")));
}
submitCompaction(unit, dropDelete);
});
}
(6) 调用的submitCompaction() -- 核心
核心作用:创建压缩任务MergeTreeCompactTask,将其交给线程池去执行
private void submitCompaction(CompactUnit unit, boolean dropDelete) {
// 创建文件删除的Supplier
Supplier<CompactDeletionFile> compactDfSupplier = () -> null;
if (dvMaintainer != null) {
compactDfSupplier =
lazyGenDeletionFile
? () -> CompactDeletionFile.lazyGeneration(dvMaintainer)
: () -> CompactDeletionFile.generateFiles(dvMaintainer);
}
// 创建压缩任务MergeTreeCompactTask
MergeTreeCompactTask task =
new MergeTreeCompactTask(
keyComparator,
compactionFileSize,
rewriter,
unit,
dropDelete,
levels.maxLevel(),
metricsReporter,
compactDfSupplier);
if (LOG.isDebugEnabled()) {
LOG.debug(
"Pick these files (name, level, size) for compaction: {}",
unit.files().stream()
.map(
file ->
String.format(
"(%s, %d, %d)",
file.fileName(), file.level(), file.fileSize()))
.collect(Collectors.joining(", ")));
}
// 提交到线程池去执行该压缩任务MergeTreeCompactTask
taskFuture = executor.submit(task);
if (metricsReporter != null) {
metricsReporter.increaseCompactionsQueuedCount();
}
}
(7) getCompactionResult()
/**
* 当前compact任务完成,则会调该方法,去更新levels中的文件结构,移除旧文件,添加新文件
* @param blocking
* @return
* @throws ExecutionException
* @throws InterruptedException
*/
@Override
public Optional<CompactResult> getCompactionResult(boolean blocking)
throws ExecutionException, InterruptedException {
// 调父类的innerGetCompactionResult()获取压缩结果
Optional<CompactResult> result = innerGetCompactionResult(blocking);
// 根据压缩结果去更新levels中的文件结构,移除旧文件,添加新文件
result.ifPresent(
r -> {
if (LOG.isDebugEnabled()) {
LOG.debug(
"Update levels in compact manager with these changes:\nBefore:\n{}\nAfter:\n{}",
r.before(),
r.after());
}
// 更新levels中的文件结构
levels.update(r.before(), r.after());
MetricUtils.safeCall(this::reportLevel0FileCount, LOG);
if (LOG.isDebugEnabled()) {
LOG.debug(
"Levels in compact manager updated. Current runs are\n{}",
levels.levelSortedRuns());
}
});
return result;
}
2.流程总结
sequenceDiagram
participant Writer as MergeTreeWriter
participant Manager as MergeTreeCompactManager
participant Levels as Levels
participant Executor as ExecutorService
participant Task as MergeTreeCompactTask
Writer->>Manager: addNewFile(file)
Manager->>Levels: addLevel0File(file)
Writer->>Manager: triggerCompaction(fullCompaction)
Manager->>Levels: levelSortedRuns()
Levels-->>Manager: List<LevelSortedRun>
alt fullCompaction=true
Manager->>Manager: pickFullCompaction() - 所有文件
else fullCompaction=false
Manager->>Manager: strategy.pick() - 部分文件
end
Manager->>Manager: submitCompaction(unit, dropDelete)
Manager->>Task: new MergeTreeCompactTask(...)
Manager->>Executor: submit(task)
Executor-->>Manager: taskFuture
Note over Executor,Task: 异步执行压缩
Task->>Task: doCompact()
Task->>Task: upgrade() / rewrite()
Writer->>Manager: getCompactionResult(blocking)
Manager->>Manager: taskFuture.get()
Manager->>Levels: update(before, after)
Manager-->>Writer: CompactResult
三.NoopCompactManager类
该类被创建的几种情况如下
- 主键表的write-only = true时
- Append表的write-only = true时
- Append表的bucket = -1时
/** A {@link CompactManager} which never compacts. */
public class NoopCompactManager implements CompactManager {
public NoopCompactManager() {}
// 永远不需要等待压缩
@Override
public boolean shouldWaitForLatestCompaction() {
return false;
}
// checkpoint前也不需要等待
@Override
public boolean shouldWaitForPreparingCheckpoint() {
return false;
}
// 空实现,不跟踪文件
@Override
public void addNewFile(DataFileMeta file) {}
// 返回空列表
@Override
public List<DataFileMeta> allFiles() {
return Collections.emptyList();
}
// 检查是否存在参数冲突,这里是不允许执行任何compact操作的
@Override
public void triggerCompaction(boolean fullCompaction) {
Preconditions.checkArgument(
!fullCompaction,
"NoopCompactManager does not support user triggered compaction.\n"
+ "If you really need a guaranteed compaction, please set "
+ CoreOptions.WRITE_ONLY.key()
+ " property of this table to false.");
}
// 返回空结果
@Override
public Optional<CompactResult> getCompactionResult(boolean blocking)
throws ExecutionException, InterruptedException {
return Optional.empty();
}
@Override
public void cancelCompaction() {}
@Override
public boolean isCompacting() {
return false;
}
@Override
public void close() throws IOException {}
}