TableLoader tableLoader = TableLoader.fromHadoopTable("file:///tmp/iceberg2");
tableLoader.open();
Table table = tableLoader.loadTable();
RewriteDataFilesActionResult result = Actions.forTable(table)
.rewriteDataFiles()
.execute();
这是flink的小文件合并代码
public class Actions {
public static final Configuration CONFIG = new Configuration()
// disable classloader check as Avro may cache class/object in the serializers.
.set(CoreOptions.CHECK_LEAKED_CLASSLOADER, false);
private StreamExecutionEnvironment env;
private Table table;
private Actions(StreamExecutionEnvironment env, Table table) {
this.env = env;
this.table = table;
}
public static Actions forTable(StreamExecutionEnvironment env, Table table) {
return new Actions(env, table);
}
public static Actions forTable(Table table) {
return new Actions(StreamExecutionEnvironment.getExecutionEnvironment(CONFIG), table);
}
public RewriteDataFilesAction rewriteDataFiles() {
return new RewriteDataFilesAction(env, table);
}
}
这是Actions类,应该主要是和spark中的结构对应,构造方法私有化,需提供静态方法forTable设置table参数去实例化Actions类,然后调用rewriteDataFiles方法返回RewriteDataFilesAction类对象,其中主要是做小文件合并的功能,调用execute()方法开始小文件合并,execute()是RewriteDataFilesAction的父类BaseRewriteDataFilesAction的方法,包含了小文件合并的主要逻辑
public RewriteDataFilesActionResult execute() {
CloseableIterable<FileScanTask> fileScanTasks = null;
try {
fileScanTasks = table.newScan()
.caseSensitive(caseSensitive)
.ignoreResiduals()
.filter(filter)
.planFiles();
} finally {
try {
if (fileScanTasks != null) {
fileScanTasks.close();
}
} catch (IOException ioe) {
LOG.warn("Failed to close task iterable", ioe);
}
}
Map<StructLikeWrapper, Collection<FileScanTask>> groupedTasks = groupTasksByPartition(fileScanTasks.iterator());
Map<StructLikeWrapper, Collection<FileScanTask>> filteredGroupedTasks = groupedTasks.entrySet().stream()
.filter(kv -> kv.getValue().size() > 1)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
// Nothing to rewrite if there's only one DataFile in each partition.
if (filteredGroupedTasks.isEmpty()) {
return RewriteDataFilesActionResult.empty();
}
// Split and combine tasks under each partition
List<CombinedScanTask> combinedScanTasks = filteredGroupedTasks.values().stream()
.map(scanTasks -> {
CloseableIterable<FileScanTask> splitTasks = TableScanUtil.splitFiles(
CloseableIterable.withNoopClose(scanTasks), targetSizeInBytes);
return TableScanUtil.planTasks(splitTasks, targetSizeInBytes, splitLookback, splitOpenFileCost);
})
.flatMap(Streams::stream)
.filter(task -> task.files().size() > 1 || isPartialFileScan(task))
.collect(Collectors.toList());
if (combinedScanTasks.isEmpty()) {
return RewriteDataFilesActionResult.empty();
}
List<DataFile> addedDataFiles = rewriteDataForTasks(combinedScanTasks);
List<DataFile> currentDataFiles = combinedScanTasks.stream()
.flatMap(tasks -> tasks.files().stream().map(FileScanTask::file))
.collect(Collectors.toList());
replaceDataFiles(currentDataFiles, addedDataFiles);
return new RewriteDataFilesActionResult(currentDataFiles, addedDataFiles);
}
planFiles()方法在之前的iceberg读取数据文章中分析过,主要是返回FileScanTask的迭代器,FileScanTask中包含了每个data文件和其需要过滤掉的delete文件。 然后根据分区group by 再生成combinedScanTasks任务列表,其中splitFiles根据目标文件大小切分文件,planTasks则是生成合并文件的任务,然后会根据一些条件去过滤,比如task.files().size()>1,表示合并文件的个数大于一才生成任务,但是如果是v2版本支持delete的情况,
如上程序中,表中有5个data文件,因此有5个fileScan任务
而生成的combinedScanTasks中有两个tasks,其中每个tasks中都包含有2两个FileScanTask,所以有4个文件进行合并,而剩下的单独的一个文件被task文件数大于1的条件过滤掉了
最后replaceDataFiles方法重写元数据
public void replaceDataFiles(Iterable<DataFile> deletedDataFiles, Iterable<DataFile> addedDataFiles) {
try {
RewriteFiles rewriteFiles = table.newRewrite();
rewriteFiles.rewriteFiles(Sets.newHashSet(deletedDataFiles), Sets.newHashSet(addedDataFiles));
commit(rewriteFiles);
} catch (Exception e) {
Tasks.foreach(Iterables.transform(addedDataFiles, f -> f.path().toString()))
.noRetry()
.suppressFailureWhenFinished()
.onFailure((location, exc) -> LOG.warn("Failed to delete: {}", location, exc))
.run(fileIO::deleteFile);
throw e;
}
}
RewriteFiles rewriteFiles = table.newRewrite() 这里从table中生成了一个RewriteFiles对象,其中包含有表名和TableOperations信息,这其中就包含了表的当前版本信息,注意这是在小文件合并开始时的表信息,而不是开始rewriteFiles时当前的表版本信息,
这里注意下newRewrite()返回一个BaseRewriteFiles对象,BaseRewriteFiles继承自MergingSnapshotProducer,BaseRewriteFiles继承自MergingSnapshotProducer又继承自SnapshotProducer,这是会初始化一些父类SnapshotProducer的一些信息,其中有一条重要的信息commitUUID
private final String commitUUID = UUID.randomUUID().toString();
表示本次提交的uuid,而BaseRewriteFiles对象就表示本次重写操作的snapshot更新信息
RewriteFiles对象中记录了4种文件: 1.需要被删除的data文件 2.需要被删除的delete文件 3.需要被添加的data文件 4.需要被添加的delete文件 RewriteFiles继承自SnapshotUpdate作为一种Snapshot更新的操作,具体实现类为BaseRewriteFiles,同时继承了MergingSnapshotProducer抽象类继承了SnapshotProducer抽象类中是实现了commit()对Snapshot更新的操作进行提交
@Override
public void commit() {
// this is always set to the latest commit attempt's snapshot id.
AtomicLong newSnapshotId = new AtomicLong(-1L);
try {
Tasks.foreach(ops)
.retry(base.propertyAsInt(COMMIT_NUM_RETRIES, COMMIT_NUM_RETRIES_DEFAULT))
.exponentialBackoff(
base.propertyAsInt(COMMIT_MIN_RETRY_WAIT_MS, COMMIT_MIN_RETRY_WAIT_MS_DEFAULT),
base.propertyAsInt(COMMIT_MAX_RETRY_WAIT_MS, COMMIT_MAX_RETRY_WAIT_MS_DEFAULT),
base.propertyAsInt(COMMIT_TOTAL_RETRY_TIME_MS, COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT),
2.0 /* exponential */)
.onlyRetryOn(CommitFailedException.class)
.run(taskOps -> {
Snapshot newSnapshot = apply();
newSnapshotId.set(newSnapshot.snapshotId());
TableMetadata updated;
if (stageOnly) {
updated = base.addStagedSnapshot(newSnapshot);
} else {
updated = base.replaceCurrentSnapshot(newSnapshot);
}
if (updated == base) {
// do not commit if the metadata has not changed. for example, this may happen when setting the current
// snapshot to an ID that is already current. note that this check uses identity.
return;
}
// if the table UUID is missing, add it here. the UUID will be re-created each time this operation retries
// to ensure that if a concurrent operation assigns the UUID, this operation will not fail.
taskOps.commit(base, updated.withUUID());
});
} catch (CommitStateUnknownException commitStateUnknownException) {
throw commitStateUnknownException;
} catch (RuntimeException e) {
Exceptions.suppressAndThrow(e, this::cleanAll);
}
LOG.info("Committed snapshot {} ({})", newSnapshotId.get(), getClass().getSimpleName());
try {
// at this point, the commit must have succeeded. after a refresh, the snapshot is loaded by
// id in case another commit was added between this commit and the refresh.
Snapshot saved = ops.refresh().snapshot(newSnapshotId.get());
if (saved != null) {
cleanUncommitted(Sets.newHashSet(saved.allManifests()));
// also clean up unused manifest lists created by multiple attempts
for (String manifestList : manifestLists) {
if (!saved.manifestListLocation().equals(manifestList)) {
deleteFile(manifestList);
}
}
} else {
// saved may not be present if the latest metadata couldn't be loaded due to eventual
// consistency problems in refresh. in that case, don't clean up.
LOG.warn("Failed to load committed snapshot, skipping manifest clean-up");
}
} catch (RuntimeException e) {
LOG.warn("Failed to load committed table metadata, skipping manifest clean-up", e);
}
notifyListeners();
}
这里使用Tasks应该是一个自己实现的多线程运行任务的类, 调用apply()方法返回一个新的Snapshot,apply()方法实现在SnapshotProducer类中,
public Snapshot apply() {
this.base = refresh();
Long parentSnapshotId = base.currentSnapshot() != null ?
base.currentSnapshot().snapshotId() : null;
long sequenceNumber = base.nextSequenceNumber();
// run validations from the child operation
validate(base);
List<ManifestFile> manifests = apply(base);
if (base.formatVersion() > 1 || base.propertyAsBoolean(MANIFEST_LISTS_ENABLED, MANIFEST_LISTS_ENABLED_DEFAULT)) {
OutputFile manifestList = manifestListPath();
try (ManifestListWriter writer = ManifestLists.write(
ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId, sequenceNumber)) {
// keep track of the manifest lists created
manifestLists.add(manifestList.location());
ManifestFile[] manifestFiles = new ManifestFile[manifests.size()];
Tasks.range(manifestFiles.length)
.stopOnFailure().throwFailureWhenFinished()
.executeWith(ThreadPools.getWorkerPool())
.run(index ->
manifestFiles[index] = manifestsWithMetadata.get(manifests.get(index)));
writer.addAll(Arrays.asList(manifestFiles));
} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to write manifest list file");
}
return new BaseSnapshot(ops.io(),
sequenceNumber, snapshotId(), parentSnapshotId, System.currentTimeMillis(), operation(), summary(base),
manifestList.location());
} else {
return new BaseSnapshot(ops.io(),
snapshotId(), parentSnapshotId, System.currentTimeMillis(), operation(), summary(base),
manifests);
}
}
这里首先是refresh()方法,最后调用的是HadoopTableOperations中的refresh()方法
public TableMetadata refresh() {
int ver = version != null ? version : findVersion();
try {
Path metadataFile = getMetadataFile(ver);
if (version == null && metadataFile == null && ver == 0) {
// no v0 metadata means the table doesn't exist yet
return null;
} else if (metadataFile == null) {
throw new ValidationException("Metadata file for version %d is missing", ver);
}
Path nextMetadataFile = getMetadataFile(ver + 1);
while (nextMetadataFile != null) {
ver += 1;
metadataFile = nextMetadataFile;
nextMetadataFile = getMetadataFile(ver + 1);
}
updateVersionAndMetadata(ver, metadataFile.toString());
this.shouldRefresh = false;
return currentMetadata;
} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to refresh the table");
}
}
这里可以看到会取下一个版本号,如果存在这个版本就一直循环,所以这里最后会取得最新版本的下一个版本号,然后生成最新的metadata赋值给currentMetadata并返回。
回到SnapshotProducer类中的apply()方法中,其中又调用了一个apply()方法,在其中进行了ManifestFile的写磁盘逻辑并返回List对象,apply()方法在MergingSnapshotProducer抽象类中实现
public List<ManifestFile> apply(TableMetadata base) {
Snapshot current = base.currentSnapshot();
// 过滤出需要保留的dataManifests,这里current.dataManifests()返回最新的dataManifests,然后再filterManifests方法中过滤出需要保留的dataManifests
List<ManifestFile> filtered = filterManager.filterManifests(
base.schema(), current != null ? current.dataManifests() : null);
// 遍历所有保留的dataManifests得到最小的SequenceNumber
long minDataSequenceNumber = filtered.stream()
.map(ManifestFile::minSequenceNumber)
.filter(seq -> seq > 0) // filter out unassigned sequence numbers in rewritten manifests
.reduce(base.lastSequenceNumber(), Math::min);
// 根据最小的SequenceNumber删除DeleteFiles,这里只是一个设置过滤条件的方法
deleteFilterManager.dropDeleteFilesOlderThan(minDataSequenceNumber);
List<ManifestFile> filteredDeletes = deleteFilterManager.filterManifests(
base.schema(), current != null ? current.deleteManifests() : null);
// only keep manifests that have live data files or that were written by this commit
Predicate<ManifestFile> shouldKeep = manifest ->
manifest.hasAddedFiles() || manifest.hasExistingFiles() || manifest.snapshotId() == snapshotId();
Iterable<ManifestFile> unmergedManifests = Iterables.filter(
Iterables.concat(prepareNewManifests(), filtered), shouldKeep);
Iterable<ManifestFile> unmergedDeleteManifests = Iterables.filter(
Iterables.concat(prepareDeleteManifests(), filteredDeletes), shouldKeep);
// update the snapshot summary
summaryBuilder.clear();
summaryBuilder.merge(addedFilesSummary);
summaryBuilder.merge(appendedManifestsSummary);
summaryBuilder.merge(filterManager.buildSummary(filtered));
summaryBuilder.merge(deleteFilterManager.buildSummary(filteredDeletes));
List<ManifestFile> manifests = Lists.newArrayList();
Iterables.addAll(manifests, mergeManager.mergeManifests(unmergedManifests));
Iterables.addAll(manifests, deleteMergeManager.mergeManifests(unmergedDeleteManifests));
return manifests;
}
调用filterManager中的filterManifests方法过滤出需要保留的data ManifestFile,这里的Snapshot和dataManifests都是最新的,既包含小文件合并后更新的Manifests,
/**
* Filter deleted files out of a list of manifests.
*
* @param tableSchema the current table schema
* @param manifests a list of manifests to be filtered
* @return an array of filtered manifests
*/
List<ManifestFile> filterManifests(Schema tableSchema, List<ManifestFile> manifests) {
if (manifests == null || manifests.isEmpty()) {
validateRequiredDeletes();
return ImmutableList.of();
}
// use a common metrics evaluator for all manifests because it is bound to the table schema
StrictMetricsEvaluator metricsEvaluator = new StrictMetricsEvaluator(tableSchema, deleteExpression);
ManifestFile[] filtered = new ManifestFile[manifests.size()];
// open all of the manifest files in parallel, use index to avoid reordering
// 这里对每个ManifestFile进行单独调用filterManifest过滤,这里会在磁盘上写入删除对应文件后的ManifestFile,本返回新的filterManifest对象
Tasks.range(filtered.length)
.stopOnFailure().throwFailureWhenFinished()
.executeWith(ThreadPools.getWorkerPool())
.run(index -> {
ManifestFile manifest = filterManifest(metricsEvaluator, manifests.get(index));
filtered[index] = manifest;
});
validateRequiredDeletes(filtered);
return Arrays.asList(filtered);
}
deletePaths是ManifestFilterManager类中的一个成员变量,包含了所有已被rewrite需要删除的datafiles
这里返回filtered后重新进行验证deletes文件,是否所有的deletePaths都包含在新的ManifestFiles的删除的文件中
deletePaths是ManifestFilterManager类中的一个成员变量,包含了本次rewrite所有已被rewrite需要删除的datafiles
/**
* Throws a {@link ValidationException} if any deleted file was not present in a filtered manifest.
*
* @param manifests a set of filtered manifests
*/
private void validateRequiredDeletes(ManifestFile... manifests) {
if (failMissingDeletePaths) {
Set<CharSequence> deletedFiles = deletedFiles(manifests);
ValidationException.check(deletedFiles.containsAll(deletePaths),
"Missing required files to delete: %s",
COMMA.join(Iterables.filter(deletePaths, path -> !deletedFiles.contains(path))));
}
}
然后调用filterManifest方法过滤Manifest文件
/**
* @return a ManifestReader that is a filtered version of the input manifest.
*/
private ManifestFile filterManifest(StrictMetricsEvaluator metricsEvaluator, ManifestFile manifest) {
ManifestFile cached = filteredManifests.get(manifest);
if (cached != null) {
return cached;
}
boolean hasLiveFiles = manifest.hasAddedFiles() || manifest.hasExistingFiles();
if (!hasLiveFiles || !canContainDeletedFiles(manifest)) {
filteredManifests.put(manifest, manifest);
return manifest;
}
try (ManifestReader<F> reader = newManifestReader(manifest)) {
// this assumes that the manifest doesn't have files to remove and streams through the
// manifest without copying data. if a manifest does have a file to remove, this will break
// out of the loop and move on to filtering the manifest.
boolean hasDeletedFiles = manifestHasDeletedFiles(metricsEvaluator, reader);
if (!hasDeletedFiles) {
filteredManifests.put(manifest, manifest);
return manifest;
}
return filterManifestWithDeletedFiles(metricsEvaluator, manifest, reader);
} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to close manifest: %s", manifest);
}
}
这里使用try(){}语句try()中的对象必须是一个实现了java.io.Closeable接口的对象,在使用完后自动的关闭该对象,通常使用在打开文件io等操作时,这里ManifestReader类就实现了java.io.Closeable接口
其中newManifestReader(manifest) 方法会打开需要读取的的avro文件,最终是在org.apache.iceberg.avro.AvroIterable中调用DataFileReader.openReader方法创建文件 这里会一次性创建4个avro文件,并初始化一些metadata信息
其中调用manifestHasDeletedFiles判断当前Manifest文件中是否有需要删除的文件
private boolean manifestHasDeletedFiles(
StrictMetricsEvaluator metricsEvaluator, ManifestReader<F> reader) {
boolean isDelete = reader.isDeleteManifestReader();
Evaluator inclusive = inclusiveDeleteEvaluator(reader.spec());
Evaluator strict = strictDeleteEvaluator(reader.spec());
boolean hasDeletedFiles = false;
for (ManifestEntry<F> entry : reader.entries()) {
F file = entry.file();
boolean fileDelete = deletePaths.contains(file.path()) ||
dropPartitions.contains(file.specId(), file.partition()) ||
(isDelete && entry.sequenceNumber() > 0 && entry.sequenceNumber() < minSequenceNumber);
if (fileDelete || inclusive.eval(file.partition())) {
ValidationException.check(
fileDelete || strict.eval(file.partition()) || metricsEvaluator.eval(file),
"Cannot delete file where some, but not all, rows match filter %s: %s",
this.deleteExpression, file.path());
hasDeletedFiles = true;
if (failAnyDelete) {
throw new DeleteException(reader.spec().partitionToPath(file.partition()));
}
break; // as soon as a deleted file is detected, stop scanning
}
}
return hasDeletedFiles;
}
deletePaths是ManifestFilterManager类中的一个成员变量,包含了所有已被rewrite需要删除的datafiles
这里文件是否删除有3个判断条件 1.通过deletePaths.contains判断该文件是否应该删除,deletePaths只包含data文件 2.dropPartitions.contains删除分区 3.如果是delelte文件这里需要判断文件的sequenceNumber是否大于0小于minSequenceNumber,这里的minSequenceNumber应该是一个以表为范围的全局的变量,表示当前未被删除的data文件中的最小SequenceNumber值,如果delelte文件的SequenceNumber小于这个值,则表示该delete文件也不再发挥做用,可以被删除
然后在filterManifest方法的最后调用filterManifestWithDeletedFiles 其中对Manifest文件进行了重写
private ManifestFile filterManifestWithDeletedFiles(
StrictMetricsEvaluator metricsEvaluator, ManifestFile manifest, ManifestReader<F> reader) {
boolean isDelete = reader.isDeleteManifestReader();
Evaluator inclusive = inclusiveDeleteEvaluator(reader.spec());
Evaluator strict = strictDeleteEvaluator(reader.spec());
// when this point is reached, there is at least one file that will be deleted in the
// manifest. produce a copy of the manifest with all deleted files removed.
List<F> deletedFiles = Lists.newArrayList();
Set<CharSequenceWrapper> deletedPaths = Sets.newHashSet();
try {
ManifestWriter<F> writer = newManifestWriter(reader.spec());
try {
reader.entries().forEach(entry -> {
F file = entry.file();
boolean fileDelete = deletePaths.contains(file.path()) ||
dropPartitions.contains(file.specId(), file.partition()) ||
(isDelete && entry.sequenceNumber() > 0 && entry.sequenceNumber() < minSequenceNumber);
if (entry.status() != ManifestEntry.Status.DELETED) {
if (fileDelete || inclusive.eval(file.partition())) {
ValidationException.check(
fileDelete || strict.eval(file.partition()) || metricsEvaluator.eval(file),
"Cannot delete file where some, but not all, rows match filter %s: %s",
this.deleteExpression, file.path());
writer.delete(entry);
CharSequenceWrapper wrapper = CharSequenceWrapper.wrap(entry.file().path());
if (deletedPaths.contains(wrapper)) {
LOG.warn("Deleting a duplicate path from manifest {}: {}",
manifest.path(), wrapper.get());
duplicateDeleteCount += 1;
} else {
// only add the file to deletes if it is a new delete
// this keeps the snapshot summary accurate for non-duplicate data
deletedFiles.add(entry.file().copyWithoutStats());
}
deletedPaths.add(wrapper);
} else {
writer.existing(entry);
}
}
});
} finally {
writer.close();
}
// return the filtered manifest as a reader
ManifestFile filtered = writer.toManifestFile();
// update caches
filteredManifests.put(manifest, filtered);
filteredManifestToDeletedFiles.put(filtered, deletedFiles);
return filtered;
} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to close manifest writer");
}
}
ManifestWriter writer = newManifestWriter(reader.spec()) 中reader为读取的avro文件,这里通过该ManifestReader的一些信息构建一个新的ManifestWriter用于重写新的avro文件中的属性
@Override
protected ManifestWriter<DataFile> newManifestWriter(PartitionSpec manifestSpec) {
return MergingSnapshotProducer.this.newManifestWriter(manifestSpec);
}
protected OutputFile newManifestOutput() {
return ops.io().newOutputFile(
ops.metadataFileLocation(FileFormat.AVRO.addExtension(commitUUID + "-m" + manifestCount.getAndIncrement())));
}
protected ManifestWriter<DataFile> newManifestWriter(PartitionSpec spec) {
return ManifestFiles.write(ops.current().formatVersion(), spec, newManifestOutput(), snapshotId());
}
经过多层调用最后调用最后调用SnapshotProducer总的newManifestOutput()创建avro文件,其文件名为commitUUID加上"-m"再加上一个序号
writer.delete(entry) 这里写入需要删除的文件属性
void delete(ManifestEntry<F> entry) {
// Use the current Snapshot ID for the delete. It is safe to delete the data file from disk
// when this Snapshot has been removed or when there are no Snapshots older than this one.
addEntry(reused.wrapDelete(snapshotId, entry.file()));
}
这里将具体需要删除的data文件用addEntry方法加入到当前ManifestWriter中,这里是用了当前的snapshotId,最后在org.apache.iceberg.avro.AvroFileAppender的add方法中调用writer.append写入数据
回到SnapshotProducer中的apply()方法, 返回的manifests中存在所有的manifest文件,如果新生成的sequenceNumber字段未初始化会设置为-1,而那些没有被重写的manifest还会保持原始的sequenceNumber ManifestListWriter writer负责写snapshot.avro文件
这里会将之前生成的所有manifests信息赋值到manifestFiles中 然后通过writer.addAll(Arrays.asList(manifestFiles))加入到writer中
@Override
public void addAll(Iterable<ManifestFile> values) {
values.forEach(this::add);
}
@Override
public void add(ManifestFile manifest) {
writer.add(prepare(manifest));
}
@Override
protected ManifestFile prepare(ManifestFile manifest) {
return wrapper.wrap(manifest);
}
这里prepare方法中通过当前ManifestListWriter中的wrapper字段去包装一个生成的manifest,返回实现类IndexedManifestFile,其中包含了commitSnapshotId和sequenceNumber信息,
private final long commitSnapshotId;
private final long sequenceNumber;
private ManifestFile wrapped = null;
IndexedManifestFile(long commitSnapshotId, long sequenceNumber) {
this.commitSnapshotId = commitSnapshotId;
this.sequenceNumber = sequenceNumber;
}
public ManifestFile wrap(ManifestFile file) {
this.wrapped = file;
return this;
}
上面add(ManifestFile manifest)方法中的writer为一个FileAppender对象,在这里实现类为AvroFileAppender,用来写ManifestFile文件
@Override
public void add(D datum) {
try {
numRecords += 1L;
writer.append(datum);
} catch (IOException e) {
throw new RuntimeIOException(e);
}
}
add方法中调用AvroFileAppender的append方法,然后调用到GenericAvroWriter中的write(T datum, Encoder out)方法
@Override
public void write(T datum, Encoder out) throws IOException {
writer.write(datum, out);
}
writer为ValueWriter对象,当前是先类为其内部抽象实现类的StructWriter的实现类RecordWriter,调用其StructWriter类的write方法
@Override
public void write(S row, Encoder encoder) throws IOException {
for (int i = 0; i < writers.length; i += 1) {
writers[i].write(get(row, i), encoder);
}
}
然后逐条写入manifest对象,get(S struct, int pos)为RecordWriter的方法,获取记录中每个位置的值
@Override
protected Object get(IndexedRecord struct, int pos) {
return struct.get(pos);
}
get调用到IndexedManifestFile中的get方法,因为IndexedManifestFile实现了org.apache.avro.generic中的IndexedRecord接口中的put和get方法,所以可以用来写avro文件,下面get方法中就是每个位置的值的
@Override
public Object get(int pos) {
switch (pos) {
case 0:
return wrapped.path();
case 1:
return wrapped.length();
case 2:
return wrapped.partitionSpecId();
case 3:
return wrapped.content().id();
case 4:
if (wrapped.sequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
// if the sequence number is being assigned here, then the manifest must be created by the current
// operation. to validate this, check that the snapshot id matches the current commit
Preconditions.checkState(commitSnapshotId == wrapped.snapshotId(),
"Found unassigned sequence number for a manifest from snapshot: %s", wrapped.snapshotId());
return sequenceNumber;
} else {
return wrapped.sequenceNumber();
}
case 5:
if (wrapped.minSequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
// same sanity check as above
Preconditions.checkState(commitSnapshotId == wrapped.snapshotId(),
"Found unassigned sequence number for a manifest from snapshot: %s", wrapped.snapshotId());
// if the min sequence number is not determined, then there was no assigned sequence number for any file
// written to the wrapped manifest. replace the unassigned sequence number with the one for this commit
return sequenceNumber;
} else {
return wrapped.minSequenceNumber();
}
case 6:
return wrapped.snapshotId();
case 7:
return wrapped.addedFilesCount();
case 8:
return wrapped.existingFilesCount();
case 9:
return wrapped.deletedFilesCount();
case 10:
return wrapped.addedRowsCount();
case 11:
return wrapped.existingRowsCount();
case 12:
return wrapped.deletedRowsCount();
case 13:
return wrapped.partitions();
default:
throw new UnsupportedOperationException("Unknown field ordinal: " + pos);
}
}
这里可以看到如果wrapped.minSequenceNumber()为-1,则获取当前IndexedManifestFile对象的sequenceNumber写入,minSequenceNumber也一样
调用write写到encoder中去
@Override
public void write(Integer i, Encoder encoder) throws IOException {
encoder.writeInt(i);
}
Snapshot实现类为BaseSnapShot,其中就已经包含了生成的snapshot avro文件名和包含的SnapshotID,SequenceNumber,Operator,添加删除文件数等信息
BaseSnapshot(FileIO io,
long sequenceNumber,
long snapshotId,
Long parentId,
long timestampMillis,
String operation,
Map<String, String> summary,
String manifestList) {
this.io = io;
this.sequenceNumber = sequenceNumber;
this.snapshotId = snapshotId;
this.parentId = parentId;
this.timestampMillis = timestampMillis;
this.operation = operation;
this.summary = summary;
this.manifestListLocation = manifestList;
}
然后调用base.replaceCurrentSnapshot方法创建TableMetadata对象
public TableMetadata replaceCurrentSnapshot(Snapshot snapshot) {
// there can be operations (viz. rollback, cherrypick) where an existing snapshot could be replacing current
if (snapshotsById.containsKey(snapshot.snapshotId())) {
return setCurrentSnapshotTo(snapshot);
}
ValidationException.check(formatVersion == 1 || snapshot.sequenceNumber() > lastSequenceNumber,
"Cannot add snapshot with sequence number %s older than last sequence number %s",
snapshot.sequenceNumber(), lastSequenceNumber);
List<Snapshot> newSnapshots = ImmutableList.<Snapshot>builder()
.addAll(snapshots)
.add(snapshot)
.build();
List<HistoryEntry> newSnapshotLog = ImmutableList.<HistoryEntry>builder()
.addAll(snapshotLog)
.add(new SnapshotLogEntry(snapshot.timestampMillis(), snapshot.snapshotId()))
.build();
return new TableMetadata(null, formatVersion, uuid, location,
snapshot.sequenceNumber(), snapshot.timestampMillis(), lastColumnId,
currentSchemaId, schemas, defaultSpecId, specs, lastAssignedPartitionId,
defaultSortOrderId, sortOrders, rowKey,
properties, snapshot.snapshotId(), newSnapshots, newSnapshotLog, addPreviousFile(file, lastUpdatedMillis));
}
TableOperations的commit方法提交,这里的是HadoopTableOperations中的commit实现
public void commit(TableMetadata base, TableMetadata metadata) {
Pair<Integer, TableMetadata> current = versionAndMetadata();
if (base != current.second()) {
throw new CommitFailedException("Cannot commit changes based on stale table metadata");
}
if (base == metadata) {
LOG.info("Nothing to commit.");
return;
}
Preconditions.checkArgument(base == null || base.location().equals(metadata.location()),
"Hadoop path-based tables cannot be relocated");
Preconditions.checkArgument(
!metadata.properties().containsKey(TableProperties.WRITE_METADATA_LOCATION),
"Hadoop path-based tables cannot relocate metadata");
String codecName = metadata.property(
TableProperties.METADATA_COMPRESSION, TableProperties.METADATA_COMPRESSION_DEFAULT);
TableMetadataParser.Codec codec = TableMetadataParser.Codec.fromName(codecName);
String fileExtension = TableMetadataParser.getFileExtension(codec);
Path tempMetadataFile = metadataPath(UUID.randomUUID().toString() + fileExtension);
TableMetadataParser.write(metadata, io().newOutputFile(tempMetadataFile.toString()));
int nextVersion = (current.first() != null ? current.first() : 0) + 1;
Path finalMetadataFile = metadataFilePath(nextVersion, codec);
FileSystem fs = getFileSystem(tempMetadataFile, conf);
try {
if (fs.exists(finalMetadataFile)) {
throw new CommitFailedException(
"Version %d already exists: %s", nextVersion, finalMetadataFile);
}
} catch (IOException e) {
throw new RuntimeIOException(e,
"Failed to check if next version exists: %s", finalMetadataFile);
}
// this rename operation is the atomic commit operation
renameToFinal(fs, tempMetadataFile, finalMetadataFile);
// update the best-effort version pointer
writeVersionHint(nextVersion);
deleteRemovedMetadataFiles(base, metadata);
this.shouldRefresh = true;
}
首先调用方法返回
private synchronized Pair<Integer, TableMetadata> versionAndMetadata() {
return Pair.of(version, currentMetadata);
}
currentMetadata为TableMetadata对象包含了当前的Metadata.json文件等信息,version为当前的版本号
回到commit方法中,先创建一个临时的metadata文件tempMetadataFile,调用TableMetadataParser中的write方法写入磁盘,
public static void write(TableMetadata metadata, OutputFile outputFile) {
internalWrite(metadata, outputFile, false);
}
public static void internalWrite(
TableMetadata metadata, OutputFile outputFile, boolean overwrite) {
boolean isGzip = Codec.fromFileName(outputFile.location()) == Codec.GZIP;
OutputStream stream = overwrite ? outputFile.createOrOverwrite() : outputFile.create();
try (OutputStream ou = isGzip ? new GZIPOutputStream(stream) : stream;
OutputStreamWriter writer = new OutputStreamWriter(ou, StandardCharsets.UTF_8)) {
JsonGenerator generator = JsonUtil.factory().createGenerator(writer);
generator.useDefaultPrettyPrinter();
toJson(metadata, generator);
generator.flush();
} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to write json to file: %s", outputFile);
}
}
主要的metadata相关代码在toJson中 然后生成新的版本号,和正式文件名,重命名,写入版本文件,删除临时文件