FlinkSink
.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA)
.table(table)
.tableLoader(tableLoader)
.writeParallelism(1)
.equalityFieldColumns(ImmutableList.of("data"))
.rewriteDataTasksParallelism(2)
.build();
首先在 build() 方法中通过IcebergStreamWriter<RowData> streamWriter = createStreamWriter(table, flinkRowType, equalityFieldIds);创建streamWriter
static IcebergStreamWriter<RowData> createStreamWriter(Table table,
RowType flinkRowType,
List<Integer> equalityFieldIds) {
Map<String, String> props = table.properties();
long targetFileSize = getTargetFileSizeBytes(props);
FileFormat fileFormat = getFileFormat(props);
TaskWriterFactory<RowData> taskWriterFactory = new RowDataTaskWriterFactory(table.schema(), flinkRowType,
table.spec(), table.locationProvider(), table.io(), table.encryption(), targetFileSize, fileFormat, props,
equalityFieldIds);
return new IcebergStreamWriter<>(table.name(), taskWriterFactory);
其中会创建一个TaskWriterFactory,在IcebergStreamWriter中负责创建写入功能的对象,
public RowDataTaskWriterFactory(Schema schema,
RowType flinkSchema,
PartitionSpec spec,
LocationProvider locations,
FileIO io,
EncryptionManager encryptionManager,
long targetFileSizeBytes,
FileFormat format,
Map<String, String> tableProperties,
List<Integer> equalityFieldIds) {
this.schema = schema;
this.flinkSchema = flinkSchema;
this.spec = spec;
this.locations = locations;
this.io = io;
this.encryptionManager = encryptionManager;
this.targetFileSizeBytes = targetFileSizeBytes;
this.format = format;
this.equalityFieldIds = equalityFieldIds;
if (equalityFieldIds == null || equalityFieldIds.isEmpty()) {
this.appenderFactory = new FlinkAppenderFactory(schema, flinkSchema, tableProperties, spec);
} else {
// TODO provide the ability to customize the equality-delete row schema.
Schema deleteSchema = TypeUtil.select(schema, new HashSet<>(equalityFieldIds));
this.appenderFactory = new FlinkAppenderFactory(schema, flinkSchema, tableProperties, spec,
ArrayUtil.toIntArray(equalityFieldIds), deleteSchema, null);
}
}
这里创建一个FlinkAppenderFactory工厂对象,该对象负责创建数据文件写入的相关对象,其中会传入equalityFieldIds,deleteSchema等信息
这里FlinkAppenderFactory实现了FileAppenderFactory接口,其中实现了newDataWriter,newEqDeleteWriter,newPosDeleteWriter三个方法,创建写datafile,EqDeleteFile,PoseleteFile的对象。
然后再IcebergStreamWriter的open方法中会调用TaskWriterFactory的create()进行创建TaskWriter,taskWriter是IcebergStreamWriter中具体负责写入数据的对象,
这里是通过
return new UnpartitionedDeltaWriter(spec, format, appenderFactory, outputFileFactory, io,targetFileSizeBytes, schema, flinkSchema, equalityFieldIds);
创建一个UnpartitionedDeltaWriter对象进行非分区增量写入,
这里UnpartitionedDeltaWriter的继承类图如下
这里TaskWriter接口负责接收记录写入和提供生成的文件 其中write()方法写入记录到数据文件 dataFiles()方法可以关闭writer并返回已完成的datafile complete()方法则关闭writer并返回已完成的datafile和deletefile
public interface TaskWriter<T> extends Closeable {
void write(T row) throws IOException;
void abort() throws IOException;
default DataFile[] dataFiles() throws IOException {
WriteResult result = complete();
Preconditions.checkArgument(result.deleteFiles() == null || result.deleteFiles().length == 0,
"Should have no delete files in this write result.");
return result.dataFiles();
}
WriteResult complete() throws IOException;
}
BaseTaskWriter实现了TaskWriter接口,实现了很多基本的写逻辑,其中有5个内部类BaseEqualityDeltaWriter,PathOffset,BaseRollingWriter,RollingFileWriter,RollingEqDeleteWriter,其中BaseRollingWriter,RollingFileWriter,RollingEqDeleteWriter的继承关系如图
BaseRollingWriter为写入数据的基本类,主要实现了滚动写入,即控制写入文件大小,在写入到指定大小时,重新生成新的文件进行写入的功能
而RollingFileWriter,RollingEqDeleteWriter继承自BaseRollingWriter实现了其中的抽象方法
abstract W newWriter(EncryptedOutputFile file, StructLike partition);
abstract long length(W writer);
abstract void write(W writer, T record);
abstract void complete(W closedWriter);
PathOffset是一个记录文件偏移量的一个工具类,负责记录写入dataFile的当前的偏移量,在BaseEqualityDeltaWriter中使用