iceberg过期删除

1,174 阅读5分钟

过期删除代码

        table.expireSnapshots()
                .expireOlderThan(tsToExpire)
                .commit();

直接创建一个RemoveSnapshots,然后设置过期条件,然后直接进入到RemoveSnapshots的commit方法中

@Override
  public void commit() {
    Tasks.foreach(ops)
        .retry(base.propertyAsInt(COMMIT_NUM_RETRIES, COMMIT_NUM_RETRIES_DEFAULT))
        .exponentialBackoff(
            base.propertyAsInt(COMMIT_MIN_RETRY_WAIT_MS, COMMIT_MIN_RETRY_WAIT_MS_DEFAULT),
            base.propertyAsInt(COMMIT_MAX_RETRY_WAIT_MS, COMMIT_MAX_RETRY_WAIT_MS_DEFAULT),
            base.propertyAsInt(COMMIT_TOTAL_RETRY_TIME_MS, COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT),
            2.0 /* exponential */)
        .onlyRetryOn(CommitFailedException.class)
        .run(item -> {
          TableMetadata updated = internalApply();
          ops.commit(base, updated);
        });
    LOG.info("Committed snapshot changes");

    if (cleanExpiredFiles) {
      cleanExpiredSnapshots();
    } else {
      LOG.info("Cleaning up manifest and data files disabled, leaving them in place");
    }
  }

在internalApply()中对过期snapshot进行删除并构建新的metadata

  private TableMetadata internalApply() {
    this.base = ops.refresh();

    Set<Long> idsToRetain = Sets.newHashSet();
    List<Long> ancestorIds = SnapshotUtil.ancestorIds(base.currentSnapshot(), base::snapshot);
    if (minNumSnapshots >= ancestorIds.size()) {
      idsToRetain.addAll(ancestorIds);
    } else {
      idsToRetain.addAll(ancestorIds.subList(0, minNumSnapshots));
    }

    TableMetadata updateMeta = base.removeSnapshotsIf(snapshot ->
        idsToRemove.contains(snapshot.snapshotId()) ||
        (snapshot.timestampMillis() < expireOlderThan && !idsToRetain.contains(snapshot.snapshotId())));
    List<Snapshot> updateSnapshots = updateMeta.snapshots();
    List<Snapshot> baseSnapshots = base.snapshots();
    return updateSnapshots.size() != baseSnapshots.size() ? updateMeta : base;
  }

这里调用base.removeSnapshotsIf删除过期的snapshot,参数为一个判断snapshot过期的lambda表达式,

  public TableMetadata removeSnapshotsIf(Predicate<Snapshot> removeIf) {
    List<Snapshot> filtered = Lists.newArrayListWithExpectedSize(snapshots.size());
    for (Snapshot snapshot : snapshots) {
      // keep the current snapshot and any snapshots that do not match the removeIf condition
      if (snapshot.snapshotId() == currentSnapshotId || !removeIf.test(snapshot)) {
        filtered.add(snapshot);
      }
    }

    // update the snapshot log
    Set<Long> validIds = Sets.newHashSet(Iterables.transform(filtered, Snapshot::snapshotId));
    List<HistoryEntry> newSnapshotLog = Lists.newArrayList();
    for (HistoryEntry logEntry : snapshotLog) {
      if (validIds.contains(logEntry.snapshotId())) {
        // copy the log entries that are still valid
        newSnapshotLog.add(logEntry);
      } else {
        // any invalid entry causes the history before it to be removed. otherwise, there could be
        // history gaps that cause time-travel queries to produce incorrect results. for example,
        // if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed, the history cannot be
        // [(t1, s1), (t3, s3)] because it appears that s3 was current during the time between t2
        // and t3 when in fact s2 was the current snapshot.
        newSnapshotLog.clear();
      }
    }

    return new TableMetadata(null, formatVersion, uuid, location,
        lastSequenceNumber, System.currentTimeMillis(), lastColumnId, currentSchemaId, schemas, defaultSpecId, specs,
        lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentSnapshotId, filtered,
        ImmutableList.copyOf(newSnapshotLog), addPreviousFile(file, lastUpdatedMillis));
  }

首先遍历snapshot,将当前snapshot和不符合过期条件的snapshot加入到filtered中,然后是删除过期的snapshotLog,然后是生成新的TableMetadata

回到internalApply中,返回最新的TableMetadata

然后进行提交,即写入新的TableMetadata文件

回到commit方法中,如果cleanExpiredFiles为true,则进入cleanExpiredSnapshots方法对过期文件进行删除

  private void cleanExpiredSnapshots() {
    // clean up the expired snapshots:
    // 1. Get a list of the snapshots that were removed
    // 2. Delete any data files that were deleted by those snapshots and are not in the table
    // 3. Delete any manifests that are no longer used by current snapshots
    // 4. Delete the manifest lists

    TableMetadata current = ops.refresh();

    Set<Long> validIds = Sets.newHashSet();
    for (Snapshot snapshot : current.snapshots()) {
      validIds.add(snapshot.snapshotId());
    }

    Set<Long> expiredIds = Sets.newHashSet();
    for (Snapshot snapshot : base.snapshots()) {
      long snapshotId = snapshot.snapshotId();
      if (!validIds.contains(snapshotId)) {
        // the snapshot was expired
        LOG.info("Expired snapshot: {}", snapshot);
        expiredIds.add(snapshotId);
      }
    }

    if (expiredIds.isEmpty()) {
      // if no snapshots were expired, skip cleanup
      return;
    }

    LOG.info("Committed snapshot changes; cleaning up expired manifests and data files.");

    removeExpiredFiles(current.snapshots(), validIds, expiredIds);
  }

获取当前的snapshotID,然后遍历之前的snapshotId,如果不包含在当前的snapshotID中则视为过期id 然后进入removeExpiredFiles方法删除过期文件,这里current是当前的TableMetadata,其中会将之前过期的snapshot删除,而下面base.snapshots()使用的是之前的TableMetadata,包含了之前的snapshot,

  private void removeExpiredFiles(List<Snapshot> snapshots, Set<Long> validIds, Set<Long> expiredIds) {
    // Reads and deletes are done using Tasks.foreach(...).suppressFailureWhenFinished to complete
    // as much of the delete work as possible and avoid orphaned data or manifest files.

    // this is the set of ancestors of the current table state. when removing snapshots, this must
    // only remove files that were deleted in an ancestor of the current table state to avoid
    // physically deleting files that were logically deleted in a commit that was rolled back.
    Set<Long> ancestorIds = Sets.newHashSet(SnapshotUtil.ancestorIds(base.currentSnapshot(), base::snapshot));

    Set<Long> pickedAncestorSnapshotIds = Sets.newHashSet();
    for (long snapshotId : ancestorIds) {
      String sourceSnapshotId = base.snapshot(snapshotId).summary().get(SnapshotSummary.SOURCE_SNAPSHOT_ID_PROP);
      if (sourceSnapshotId != null) {
        // protect any snapshot that was cherry-picked into the current table state
        pickedAncestorSnapshotIds.add(Long.parseLong(sourceSnapshotId));
      }
    }

    // find manifests to clean up that are still referenced by a valid snapshot, but written by an expired snapshot
    Set<String> validManifests = Sets.newHashSet();
    Set<ManifestFile> manifestsToScan = Sets.newHashSet();
    Tasks.foreach(snapshots).retry(3).suppressFailureWhenFinished()
        .onFailure((snapshot, exc) ->
            LOG.warn("Failed on snapshot {} while reading manifest list: {}", snapshot.snapshotId(),
                snapshot.manifestListLocation(), exc))
        .run(
            snapshot -> {
              try (CloseableIterable<ManifestFile> manifests = readManifestFiles(snapshot)) {
                for (ManifestFile manifest : manifests) {
                  validManifests.add(manifest.path());

                  long snapshotId = manifest.snapshotId();
                  // whether the manifest was created by a valid snapshot (true) or an expired snapshot (false)
                  boolean fromValidSnapshots = validIds.contains(snapshotId);
                  // whether the snapshot that created the manifest was an ancestor of the table state
                  boolean isFromAncestor = ancestorIds.contains(snapshotId);
                  // whether the changes in this snapshot have been picked into the current table state
                  boolean isPicked = pickedAncestorSnapshotIds.contains(snapshotId);
                  // if the snapshot that wrote this manifest is no longer valid (has expired),
                  // then delete its deleted files. note that this is only for expired snapshots that are in the
                  // current table state
                  if (!fromValidSnapshots && (isFromAncestor || isPicked) && manifest.hasDeletedFiles()) {
                    manifestsToScan.add(manifest.copy());
                  }
                }

              } catch (IOException e) {
                throw new RuntimeIOException(e,
                    "Failed to close manifest list: %s", snapshot.manifestListLocation());
              }
            });

    // find manifests to clean up that were only referenced by snapshots that have expired
    Set<String> manifestListsToDelete = Sets.newHashSet();
    Set<String> manifestsToDelete = Sets.newHashSet();
    Set<ManifestFile> manifestsToRevert = Sets.newHashSet();
    Tasks.foreach(base.snapshots()).retry(3).suppressFailureWhenFinished()
        .onFailure((snapshot, exc) ->
            LOG.warn("Failed on snapshot {} while reading manifest list: {}", snapshot.snapshotId(),
                snapshot.manifestListLocation(), exc))
        .run(
            snapshot -> {
              long snapshotId = snapshot.snapshotId();
              if (!validIds.contains(snapshotId)) {
                // determine whether the changes in this snapshot are in the current table state
                if (pickedAncestorSnapshotIds.contains(snapshotId)) {
                  // this snapshot was cherry-picked into the current table state, so skip cleaning it up.
                  // its changes will expire when the picked snapshot expires.
                  // A -- C -- D (source=B)
                  //  `- B <-- this commit
                  return;
                }

                long sourceSnapshotId = PropertyUtil.propertyAsLong(
                    snapshot.summary(), SnapshotSummary.SOURCE_SNAPSHOT_ID_PROP, -1);
                if (ancestorIds.contains(sourceSnapshotId)) {
                  // this commit was cherry-picked from a commit that is in the current table state. do not clean up its
                  // changes because it would revert data file additions that are in the current table.
                  // A -- B -- C
                  //  `- D (source=B) <-- this commit
                  return;
                }

                if (pickedAncestorSnapshotIds.contains(sourceSnapshotId)) {
                  // this commit was cherry-picked from a commit that is in the current table state. do not clean up its
                  // changes because it would revert data file additions that are in the current table.
                  // A -- C -- E (source=B)
                  //  `- B `- D (source=B) <-- this commit
                  return;
                }

                // find any manifests that are no longer needed
                try (CloseableIterable<ManifestFile> manifests = readManifestFiles(snapshot)) {
                  for (ManifestFile manifest : manifests) {
                    if (!validManifests.contains(manifest.path())) {
                      manifestsToDelete.add(manifest.path());

                      boolean isFromAncestor = ancestorIds.contains(manifest.snapshotId());
                      boolean isFromExpiringSnapshot = expiredIds.contains(manifest.snapshotId());

                      if (isFromAncestor && manifest.hasDeletedFiles()) {
                        // Only delete data files that were deleted in by an expired snapshot if that
                        // snapshot is an ancestor of the current table state. Otherwise, a snapshot that
                        // deleted files and was rolled back will delete files that could be in the current
                        // table state.
                        manifestsToScan.add(manifest.copy());
                      }

                      if (!isFromAncestor && isFromExpiringSnapshot && manifest.hasAddedFiles()) {
                        // Because the manifest was written by a snapshot that is not an ancestor of the
                        // current table state, the files added in this manifest can be removed. The extra
                        // check whether the manifest was written by a known snapshot that was expired in
                        // this commit ensures that the full ancestor list between when the snapshot was
                        // written and this expiration is known and there is no missing history. If history
                        // were missing, then the snapshot could be an ancestor of the table state but the
                        // ancestor ID set would not contain it and this would be unsafe.
                        manifestsToRevert.add(manifest.copy());
                      }
                    }
                  }
                } catch (IOException e) {
                  throw new RuntimeIOException(e,
                      "Failed to close manifest list: %s", snapshot.manifestListLocation());
                }

                // add the manifest list to the delete set, if present
                if (snapshot.manifestListLocation() != null) {
                  manifestListsToDelete.add(snapshot.manifestListLocation());
                }
              }
            });
    deleteDataFiles(manifestsToScan, manifestsToRevert, validIds);
    deleteMetadataFiles(manifestsToDelete, manifestListsToDelete);
  }

然后分别在deleteDataFiles和deleteMetadataFiles中删除data文件和Metadata文件

  private void deleteDataFiles(Set<ManifestFile> manifestsToScan, Set<ManifestFile> manifestsToRevert,
                               Set<Long> validIds) {
    Set<String> filesToDelete = findFilesToDelete(manifestsToScan, manifestsToRevert, validIds);
    for(String file : filesToDelete){
      System.out.println("Expired snapshot: Files to deletes:" + file);
    }
    Tasks.foreach(filesToDelete)
        .executeWith(deleteExecutorService)
        .retry(3).stopRetryOn(NotFoundException.class).suppressFailureWhenFinished()
        .onFailure((file, exc) -> LOG.warn("Delete failed for data file: {}", file, exc))
        .run(file -> deleteFunc.accept(file));
  }

这里先调用findFilesToDelete方法去找到需要删除的文件

private Set<String> findFilesToDelete(Set<ManifestFile> manifestsToScan, Set<ManifestFile> manifestsToRevert,
                                        Set<Long> validIds) {
    Set<String> filesToDelete = ConcurrentHashMap.newKeySet();
    Tasks.foreach(manifestsToScan)
        .retry(3).suppressFailureWhenFinished()
        .executeWith(ThreadPools.getWorkerPool())
        .onFailure((item, exc) -> LOG.warn("Failed to get deleted files: this may cause orphaned data files", exc))
        .run(manifest -> {
          // the manifest has deletes, scan it to find files to delete
          try (ManifestReader<?> reader = ManifestFiles.open(manifest, ops.io(), ops.current().specsById())) {
            for (ManifestEntry<?> entry : reader.entries()) {
              // if the snapshot ID of the DELETE entry is no longer valid, the data can be deleted
              if (entry.status() == ManifestEntry.Status.DELETED &&
                  !validIds.contains(entry.snapshotId())) {
                // use toString to ensure the path will not change (Utf8 is reused)
                filesToDelete.add(entry.file().path().toString());
              }
            }
          } catch (IOException e) {
            throw new RuntimeIOException(e, "Failed to read manifest file: %s", manifest);
          }
        });

    Tasks.foreach(manifestsToRevert)
        .retry(3).suppressFailureWhenFinished()
        .executeWith(ThreadPools.getWorkerPool())
        .onFailure((item, exc) -> LOG.warn("Failed to get added files: this may cause orphaned data files", exc))
        .run(manifest -> {
          // the manifest has deletes, scan it to find files to delete
          try (ManifestReader<?> reader = ManifestFiles.open(manifest, ops.io(), ops.current().specsById())) {
            for (ManifestEntry<?> entry : reader.entries()) {
              // delete any ADDED file from manifests that were reverted
              if (entry.status() == ManifestEntry.Status.ADDED) {
                // use toString to ensure the path will not change (Utf8 is reused)
                filesToDelete.add(entry.file().path().toString());
              }
            }
          } catch (IOException e) {
            throw new RuntimeIOException(e, "Failed to read manifest file: %s", manifest);
          }
        });

    return filesToDelete;
  }

这里逻辑比较复杂,重新思考一下其实不用这么复杂,最简单的逻辑,扫描所有snapshot,找到所有manifestFile和datafile,然后扫描需要保留的snapshot,找到所有读取时需要的manifestFile和datafile,然后求差集就是需要删除的文件

这里snapshot对应多个manifestFile,这里manifestFile是必要的,也就是说如果这个snapshot保留,则其对应的所有manifestFile也应该保留,但是一个manifestFile可能对应多个datafile,datafile如果被delete则可以被删除。

所以iceberg过期删除的操作和其他的操作有点不一样 他是先写入meatdata文件,然后在读取meatdata根据meatdata中保留的snapshot进行删除