Paimon源码解读 -- Compaction-4.KeyValueFileStoreWrite

83 阅读8分钟

前言

本文重点介绍压缩中的KeyValueFileStoreWrite类,该类和compact很多流程都相关

一.AbstractFileStoreWrite抽象父类

该类的继承路线如下 image.png

1.代码解析

(1) write()

@Override
public void write(BinaryRow partition, int bucket, T data) throws Exception {
    // 1. 通过 getWriterWrapper() 获取或创建对应的 WriterContainer
    WriterContainer<T> container = getWriterWrapper(partition, bucket);
    // 2. 委托给具体的RecordWriter执行写入
    container.writer.write(data);
    if (container.indexMaintainer != null) {
        // 3. 如果有索引维护器,通知索引维护器有新数据写入
        container.indexMaintainer.notifyNewRecord(data);
    }
}

(2) compact()

@Override
public void compact(BinaryRow partition, int bucket, boolean fullCompaction) throws Exception {
    // 1.先调getWriterWrapper()获取对应partition-bucket的WriterContainer
    // 2.用其中的MergeTreeWriter实现类的compact方法进行压缩,并传入fullCompaction是否需要全量压缩
    getWriterWrapper(partition, bucket).writer.compact(fullCompaction);
}

(3) getWriterWrapper()

protected WriterContainer<T> getWriterWrapper(BinaryRow partition, int bucket) {
    // 1.获取当前对应partition的buckets
    Map<Integer, WriterContainer<T>> buckets = writers.get(partition);
    if (buckets == null) {
        buckets = new HashMap<>();
        writers.put(partition.copy(), buckets);
    }
    // 2.为每个bucket创建WriterContainer对象(其实就是RecordWriter包装起来)
    return buckets.computeIfAbsent(
            bucket, k -> createWriterContainer(partition.copy(), bucket, ignorePreviousFiles));
}

(4) prepareCommit()

关键逻辑说明

  1. 首次提交检测: 通过检查所有 writer 的 lastModifiedCommitIdentifier 是否都是 Long.MIN_VALUE 来判断
  2. 双层Iterator遍历:
    • 外层遍历 Partition
    • 内层遍历该 Partition 下的所有 Bucket
  3. 增量提交: 调用 writer.prepareCommit(waitCompaction)获取增量数据
  4. 索引同步: 收集 IndexMaintainer 和压缩删除文件的索引
  5. 智能清理:
    • 只清理提交内容为空的 writer
    • 必须满足清理检查器的条件
    • 使用 Iterator 的 remove() 方法安全删除
  6. 级联清理: 如果某个 Partition 下所有 Bucket 都被清理,则移除整个 Partition
@Override
public List<CommitMessage> prepareCommit(boolean waitCompaction, long commitIdentifier)
        throws Exception {
    // 1. 创建writer清理检查器
    Function<WriterContainer<T>, Boolean> writerCleanChecker;
    // 首次提交,不需要清理任何的写入器
    if (writers.values().stream()
                    .map(Map::values)
                    .flatMap(Collection::stream)
                    .mapToLong(w -> w.lastModifiedCommitIdentifier)
                    .max()
                    .orElse(Long.MIN_VALUE)
            == Long.MIN_VALUE) {
        // If this is the first commit, no writer should be cleaned.
        writerCleanChecker = writerContainer -> false;
    }
    // 其他提交,需要清理写入器
    else {
        writerCleanChecker = createWriterCleanChecker();
    }

    List<CommitMessage> result = new ArrayList<>();

    Iterator<Map.Entry<BinaryRow, Map<Integer, WriterContainer<T>>>> partIter =
            writers.entrySet().iterator();
    // 2.遍历partition
    while (partIter.hasNext()) {
        Map.Entry<BinaryRow, Map<Integer, WriterContainer<T>>> partEntry = partIter.next();
        BinaryRow partition = partEntry.getKey();
        Iterator<Map.Entry<Integer, WriterContainer<T>>> bucketIter =
                partEntry.getValue().entrySet().iterator();
        // 2.遍历bucket
        while (bucketIter.hasNext()) {
            Map.Entry<Integer, WriterContainer<T>> entry = bucketIter.next();
            int bucket = entry.getKey();
            WriterContainer<T> writerContainer = entry.getValue();
            // 2.1 调RecordWriter实现类的prepareCommit()去触发压缩
            CommitIncrement increment = writerContainer.writer.prepareCommit(waitCompaction);
            // 2.2 收集索引文件
            List<IndexFileMeta> newIndexFiles = new ArrayList<>();
            if (writerContainer.indexMaintainer != null) {
                newIndexFiles.addAll(writerContainer.indexMaintainer.prepareCommit());
            }
            // 2.3 处理压缩需要删除的文件
            CompactDeletionFile compactDeletionFile = increment.compactDeletionFile();
            if (compactDeletionFile != null) {
                compactDeletionFile.getOrCompute().ifPresent(newIndexFiles::add);
            }
            // 2.4 创建CommitMessage
            CommitMessageImpl committable =
                    new CommitMessageImpl(
                            partition,
                            bucket,
                            increment.newFilesIncrement(),
                            increment.compactIncrement(),
                            new IndexIncrement(newIndexFiles));
            result.add(committable);
            // 2.5 如果本次提交没有任何数据,则清理Writer
            if (committable.isEmpty()) {
                if (writerCleanChecker.apply(writerContainer)) {
                    // Clear writer if no update, and if its latest modification has committed.
                    //
                    // We need a mechanism to clear writers, otherwise there will be more and
                    // more such as yesterday's partition that no longer needs to be written.
                    if (LOG.isDebugEnabled()) {
                        LOG.debug(
                                "Closing writer for partition {}, bucket {}. "
                                        + "Writer's last modified identifier is {}, "
                                        + "while current commit identifier is {}.",
                                partition,
                                bucket,
                                writerContainer.lastModifiedCommitIdentifier,
                                commitIdentifier);
                    }
                    writerContainer.writer.close();
                    bucketIter.remove();
                }
            } else {
                // 2.5 如果本次提交有数据,更新最后修改标识
                writerContainer.lastModifiedCommitIdentifier = commitIdentifier;
            }
        }
        // 清理空的Partition
        if (partEntry.getValue().isEmpty()) {
            partIter.remove();
        }
    }

    return result;
}

二.KeyValueFileStoreWrite类

1.代码解析

(0) 核心参数

// Factory读写器工厂,根据具体的 partition 和 bucket 动态创建
private final KeyValueFileReaderFactory.Builder readerFactoryBuilder;
private final KeyValueFileWriterFactory.Builder writerFactoryBuilder;
// 三个关键比较器Supplier
private final Supplier<Comparator<InternalRow>> keyComparatorSupplier; // 主键比较器
private final Supplier<FieldsComparator> udsComparatorSupplier; // 自定义序列字段比较器,如sequnce-group和sequnce-field标记字段
private final Supplier<RecordEqualiser> logDedupEqualSupplier; // 日志相等比较器
// 合并函数工厂 - 根据配置的merge-engine,去创建不同的MergeFunction实现类
private final MergeFunctionFactory<KeyValue> mfFactory;
private final CoreOptions options; // CoreOptions配置
private final FileIO fileIO; // 文件IO
private final RowType keyType; // Key的RowType信息
private final RowType valueType; // Value的RowType信息
private final RowType partitionType; // 分区信息
private final String commitUser;
@Nullable private final RecordLevelExpire recordLevelExpire; // 记录Level的过期机制
@Nullable private Cache<String, LookupFile> lookupFileCache; // lookup缓存

(1) bufferSpillable() -- 是否允许write-buffer溢出磁盘

@VisibleForTesting
public boolean bufferSpillable() {
    // 由参数'write-buffer-spillable'绑定,没有则赋值为 (fileIO.isObjectStore() || !isStreamingMode)
    return options.writeBufferSpillable(fileIO.isObjectStore(), isStreamingMode);
}

(2) createWriter() -- 核心

创建写入器的核心流程如下:

  1. 创建KeyValueFileWriterFactory对象 -- WriterFactory
  2. 根据主键比较器、存储文件、num-levels参数,去初始化 LSM Tree 层级结构,得到Levels对象 -- levels
  3. 创建通用压缩策略UniversalCompaction,后续由该类的pick()去指定哪些文件该被压缩 (核心)

注意:通用压缩策略与下游四个参数有关,详情请看Paimon源码解读 -- Compaction-CompactStrategy

  • 参数'compaction.max-size-amplification-percent'绑定,默认200
  • 参数'compaction.size-ratio'绑定,默认1
  • 参数'num-sorted-run.compaction-trigger'绑定,默认5
  • 参数'compaction.optimization-interval'绑定,没有默认值
  1. 选择压缩策略

如果needLookup()=true,则使用ForceUpLevel0Compaction强制L0压缩策略;否则,使用UniversalCompaction

规则如下: 下面任意一个为true,则needLookup()返回true

  1. merge-engine = first-row
  2. changelog-producer = lookup
  3. deletion-vectors.enabled = true
  4. force-lookup = true
  1. createCompactManager()去创建对应的CompactManager实现类
  2. 根据上面创建的所有参数,创建MergeTreeWriter
// 创建写入器Writer
@Override
protected MergeTreeWriter createWriter(
        @Nullable Long snapshotId,
        BinaryRow partition,
        int bucket,
        List<DataFileMeta> restoreFiles,
        long restoredMaxSeqNumber,
        @Nullable CommitIncrement restoreIncrement,
        ExecutorService compactExecutor,
        @Nullable DeletionVectorsMaintainer dvMaintainer) {
    if (LOG.isDebugEnabled()) {
        LOG.debug(
                "Creating merge tree writer for partition {} bucket {} from restored files {}",
                partition,
                bucket,
                restoreFiles);
    }
    // 步骤 1: 创建 KeyValueFileWriterFactory对象 -- WriterFactory
    KeyValueFileWriterFactory writerFactory =
            writerFactoryBuilder.build(partition, bucket, options);
    // 步骤 2: 根据主键比较器、存储文件、num-levels参数,去初始化 LSM Tree 层级结构,得到Levels对象 -- levels
    Comparator<InternalRow> keyComparator = keyComparatorSupplier.get();
    Levels levels = new Levels(keyComparator, restoreFiles, options.numLevels());
    // 步骤 3: 创建通用压缩策略UniversalCompaction,后续由该类的pick()去指定哪些文件该被压缩 (核心)
    UniversalCompaction universalCompaction =
            new UniversalCompaction(
                    options.maxSizeAmplificationPercent(),      // 由参数'compaction.max-size-amplification-percent'绑定,默认200
                    options.sortedRunSizeRatio(),               // 由参数'compaction.size-ratio'绑定,默认1
                    options.numSortedRunCompactionTrigger(),    // 由参数'num-sorted-run.compaction-trigger'绑定,默认5
                    options.optimizedCompactionInterval());     // 由参数'compaction.optimization-interval'绑定,没有默认值

    /* 步骤 4: 如果needLookup(),则使用ForceUpLevel0Compaction强制L0压缩策略;否则,使用UniversalCompaction
        规则如下: 下面任意一个为true,则needLookup()返回true
        1. merge-engine = first-row
        2. changelog-producer = lookup
        3. deletion-vectors.enabled = true
        4. force-lookup = true
    */
    CompactStrategy compactStrategy =
            options.needLookup()
                    ? new ForceUpLevel0Compaction(universalCompaction)
                    : universalCompaction;
    // 步骤 5: 调createCompactManager()去创建对应的CompactManager实现类
    CompactManager compactManager =
            createCompactManager(
                    partition, bucket, compactStrategy, compactExecutor, levels, dvMaintainer);
    // 步骤 6: 根据上面创建的所有参数,创建MergeTreeWriter()
    return new MergeTreeWriter(
            bufferSpillable(),
            options.writeBufferSpillDiskSize(),
            options.localSortMaxNumFileHandles(),
            options.spillCompressOptions(),
            ioManager,
            compactManager,
            restoredMaxSeqNumber,
            keyComparator,
            mfFactory.create(),
            writerFactory,
            options.commitForceCompact(),
            options.changelogProducer(),
            restoreIncrement,
            UserDefinedSeqComparator.create(valueType, options));
}

(3) createCompactManager() -- 创建压缩管理器CompactManager

CASE-1: write-only = true,用NoopCompactManager,不执行任何压缩操作
CASE-2: write-only = false,用MergeTreeCompactManager去管理压缩任务 关于这俩CompactManager实现类详情请看Paimon源码解读 -- Compaction-5.CompactManager

private CompactManager createCompactManager(
        BinaryRow partition,
        int bucket,
        CompactStrategy compactStrategy,
        ExecutorService compactExecutor,
        Levels levels,
        @Nullable DeletionVectorsMaintainer dvMaintainer) {
    // CASE-1: write-only = true,用NoopCompactManager,不执行任何压缩操作
    if (options.writeOnly()) {
        return new NoopCompactManager();
    }
    // CASE-2: write-only = false,用MergeTreeCompactManager去管理压缩任务
    else {
        Comparator<InternalRow> keyComparator = keyComparatorSupplier.get();
        @Nullable FieldsComparator userDefinedSeqComparator = udsComparatorSupplier.get();
        // 创建CompactRewriter(核心重写器)
        CompactRewriter rewriter =
                createRewriter(
                        partition,
                        bucket,
                        keyComparator,
                        userDefinedSeqComparator,
                        levels,
                        dvMaintainer);
        // 创建并返回 MergeTreeCompactManager
        return new MergeTreeCompactManager(
                compactExecutor,
                levels,
                compactStrategy,
                keyComparator,
                options.compactionFileSize(true), // target-file-size的70%这个是压缩的目标文件大小,以它为界限区分大文件和小文件
                options.numSortedRunStopTrigger(),
                rewriter,
                compactionMetrics == null
                        ? null
                        : compactionMetrics.createReporter(partition, bucket),
                dvMaintainer,
                options.prepareCommitWaitCompaction());
    }
}

(4) createRewriter() -- 合并重写文件的核心入口

流程如下:

  1. 创建读写工厂
  2. 创建核心合并排序算法类MergeSorter(这是核心),以及其他合并相关参数
  3. 分情况去创建MergeTreeCompactRewriter对象
    • CASE-1: changelog-producer = full-compaction,创建并返回FullChangelogMergeTreeCompactRewriter
    • CASE-2: needLookup为true(规则在上面createWriter()已经介绍了),进一步区分内部工厂,然后创建并返回LookupMergeTreeCompactRewriter
    • CASE-3: 默认情况,采用MergeTreeCompactRewriter
// 核心代码:合并重写文件的核心入口方法
private MergeTreeCompactRewriter createRewriter(
        BinaryRow partition,
        int bucket,
        Comparator<InternalRow> keyComparator,
        @Nullable FieldsComparator userDefinedSeqComparator,
        Levels levels,
        @Nullable DeletionVectorsMaintainer dvMaintainer) {
    DeletionVector.Factory dvFactory = DeletionVector.factory(dvMaintainer);
    FileReaderFactory<KeyValue> readerFactory =
            readerFactoryBuilder.build(partition, bucket, dvFactory);
    if (recordLevelExpire != null) {
        readerFactory = recordLevelExpire.wrap(readerFactory);
    }
    // 步1. 创建读写工厂
    KeyValueFileWriterFactory writerFactory =
            writerFactoryBuilder.build(partition, bucket, options);
    // 步2. 创建核心合并排序算法类MergeSorter(这是核心),以及其他合并相关参数
    MergeSorter mergeSorter = new MergeSorter(options, keyType, valueType, ioManager);
    int maxLevel = options.numLevels() - 1;
    MergeEngine mergeEngine = options.mergeEngine();
    ChangelogProducer changelogProducer = options.changelogProducer();
    LookupStrategy lookupStrategy = options.lookupStrategy();
    // CASE-1: changelog-producer = full-compaction,创建并返回FullChangelogMergeTreeCompactRewriter
    if (changelogProducer.equals(FULL_COMPACTION)) {
        return new FullChangelogMergeTreeCompactRewriter(
                maxLevel,
                mergeEngine,
                readerFactory,
                writerFactory,
                keyComparator,
                userDefinedSeqComparator,
                mfFactory,
                mergeSorter,
                logDedupEqualSupplier.get());
    }
    // CASE-2: needLookup为true,进一步区分内部工厂,然后创建并返回LookupMergeTreeCompactRewriter
    /* 规则如下: 下面任意一个为true,则needLookup()返回true
        1. merge-engine = first-row
        2. changelog-producer = lookup
        3. deletion-vectors.enabled = true
        4. force-lookup = true
     */
    else if (lookupStrategy.needLookup) {
        LookupLevels.ValueProcessor<?> processor;
        LookupMergeTreeCompactRewriter.MergeFunctionWrapperFactory<?> wrapperFactory;
        FileReaderFactory<KeyValue> lookupReaderFactory = readerFactory;
        // SUB-CASE-1: merge-engine为first-row的不需要deletion-vectors.enabled为true
        if (lookupStrategy.isFirstRow) {
            if (options.deletionVectorsEnabled()) {
                throw new UnsupportedOperationException(
                        "First row merge engine does not need deletion vectors because there is no deletion of old data in this merge engine.");
            }
            // 创建简化的读取器(无需读取值,仅判断是否存在)
            lookupReaderFactory =
                    readerFactoryBuilder
                            .copyWithoutProjection()
                            .withReadValueType(RowType.of())
                            .build(partition, bucket, dvFactory);
            processor = new ContainsValueProcessor();
            wrapperFactory = new FirstRowMergeFunctionWrapperFactory(); // 适配first-row合并逻辑
        }
        // SUB-CASE-2: 其他需要lookup的场景
        else {
            // 根据是否有删除向量,创建处理键值对位置/内容的处理器
            processor =
                    lookupStrategy.deletionVector
                            ? new PositionedKeyValueProcessor(
                                    valueType,
                                    lookupStrategy.produceChangelog
                                            || mergeEngine != DEDUPLICATE
                                            || !options.sequenceField().isEmpty())
                            : new KeyValueProcessor(valueType);
            // 创建包装工厂(适配去重、序列比较等合并逻辑)
            wrapperFactory =
                    new LookupMergeFunctionWrapperFactory<>(
                            logDedupEqualSupplier.get(),
                            lookupStrategy,
                            UserDefinedSeqComparator.create(valueType, options));
        }
        // 返回Lookup专用重写器
        return new LookupMergeTreeCompactRewriter(
                maxLevel,
                mergeEngine,
                createLookupLevels(partition, bucket, levels, processor, lookupReaderFactory),
                readerFactory,
                writerFactory,
                keyComparator,
                userDefinedSeqComparator,
                mfFactory,
                mergeSorter,
                wrapperFactory,
                lookupStrategy.produceChangelog,
                dvMaintainer,
                options);
    }
    // CASE-3: 默认情况,采用MergeTreeCompactRewriter
    else {
        return new MergeTreeCompactRewriter(
                readerFactory,
                writerFactory,
                keyComparator,
                userDefinedSeqComparator,
                mfFactory,
                mergeSorter);
    }
}

(5) createLookupLevels()

// 创建Lookup层级
private <T> LookupLevels<T> createLookupLevels(
        BinaryRow partition,
        int bucket,
        Levels levels,
        LookupLevels.ValueProcessor<T> valueProcessor,
        FileReaderFactory<KeyValue> readerFactory) {
    // 1. 检查临时磁盘目录
    if (ioManager == null) {
        throw new RuntimeException(
                "Can not use lookup, there is no temp disk directory to use.");
    }
    // 2. 创建 LookupStoreFactory
    LookupStoreFactory lookupStoreFactory =
            LookupStoreFactory.create(
                    options,
                    cacheManager,
                    new RowCompactedSerializer(keyType).createSliceComparator());
    Options options = this.options.toConfiguration();
    // 3. 创建Lookup文件缓存
    if (lookupFileCache == null) {
        lookupFileCache =
                LookupFile.createCache(
                        options.get(CoreOptions.LOOKUP_CACHE_FILE_RETENTION), // 由参数'lookup.cache-file-retention'绑定,默认1h
                        options.get(CoreOptions.LOOKUP_CACHE_MAX_DISK_SIZE)); // 由参数'lookup.cache-max-disk-size'绑定,默认Long最大值
    }
    // 4.创建并返回LookupLevels
    return new LookupLevels<>(
            levels,
            keyComparatorSupplier.get(),
            keyType,
            valueProcessor,
            readerFactory::createRecordReader,
            file ->
                    ioManager
                            .createChannel(
                                    localFilePrefix(partitionType, partition, bucket, file))
                            .getPathFile(),
            lookupStoreFactory,
            bfGenerator(options),
            lookupFileCache);
}

二.总结

1.KeyValueFileStoreWrite总结

KeyValueFileStoreWrite是 Paimon Primary Key 表写入和压缩流程的中央协调者,承担以下核心职责:

  1. Writer 生命周期管理:为每个 partition-bucket 创建、复用、销毁 MergeTreeWriter
  2. 压缩管理器工厂:根据 write-only 配置选择 NoopCompactManagerMergeTreeCompactManager
  3. 压缩重写器工厂:根据 changelog-producer、lookup 等配置选择最优的 CompactRewriter
  4. 内存池协调:继承 MemoryFileStoreWrite 的共享内存池和抢占机制

2.三种重写器对比

flowchart LR
    A[createRewriter] --> B{changelog-producer = full-compaction?}
    B -->|是| C[FullChangelogMergeTreeCompactRewriter]
    B -->|否| D{needLookup = true?}
    D -->|是| E[LookupMergeTreeCompactRewriter]
    D -->|否| F[MergeTreeCompactRewriter]

1. MergeTreeCompactRewriter (标准重写器)

  • 适用场景: 标准的 Primary Key 表,无特殊优化需求
  • 核心流程: 读取 SortedRuns → 归并排序 → 应用 MergeFunction → 写入新文件
  • 特点:
    • 简单高效
    • 适合大部分场景
    • 在 rewriteCompaction() 中实现核心逻辑

2. FullChangelogMergeTreeCompactRewriter (全量 Changelog 重写器)

  • 适用场景changelog-producer = full-compaction
  • 核心特性:
    • 只在最高层级 (maxLevel) 生成 changelog
    • 使用 FullChangelogMergeFunctionWrapper 包装合并函数
    • 判断逻辑在 rewriteChangelog() 中
  • 升级策略:
    protected UpgradeStrategy upgradeStrategy(int outputLevel, DataFileMeta file) {
        return outputLevel == maxLevel ? 
            CHANGELOG_NO_REWRITE :  // 最高层: 生成 changelog, 不重写
            NO_CHANGELOG_NO_REWRITE; // 其他层: 不生成, 不重写
    }
    

3. LookupMergeTreeCompactRewriter (Lookup 优化重写器)

  • 适用场景:
    • lookup.cache-rows > 0 或 lookup.cache-file-retention > 0
    • 需要通过 Lookup 加速查询
  • 核心机制:
    • 构建 LookupLevels 用于快速查找
    • 支持两种处理器:
      • FirstRowMergeFunctionWrapperFactory: 只保留首行
      • LookupMergeFunctionWrapperFactory: 标准 Lookup 合并
    • 使用 createLookupLevels() 创建 Lookup 层级