1.Avro相关
(1) Avro生成的类字段是以_开头,导致类型不匹配,报错PojoTypeInfo
解决方案:
以前:
{
"namespace": "com.test.flink.am",
"type": "record",
"name": "Person",
"fields": [
{
"name": "_name",
"type": "string",
"default": ""
},
{
"name": "age",
"type": "int",
"default": 0
},
{
"name": "score",
"type": "int",
"default": 0
},
]
}
修改后:千万不能有_开头的字段,因为Pojo类要求必须是驼峰
{
"namespace": "com.test.flink.am",
"type": "record",
"name": "Person",
"fields": [
{
"name": "name",
"type": "string",
"default": ""
},
{
"name": "age",
"type": "int",
"default": 0
},
{
"name": "score",
"type": "int",
"default": 0
},
]
}
(2) Avro序列化和反序列化的字段缺失问题 -- 向前和向后兼容
先来了解基本概念
- 向后兼容(Backward Compatibility) :用旧版本 schema生成的数据,能被新版本 schema正确反序列化。
(例如:V1 版本的数据,能被升级后的 V2 程序读取) - 向前兼容(Forward Compatibility) :用新版本 schema生成的数据,能被旧版本 schema正确反序列化。
(例如:V2 版本的数据,能被未升级的 V1 程序读取)
那么要想针对不同版本也能正常反序列化,有2个方案
方案1:配置avro的时候给默认值,然后修改avro的反序列化schema
方案2:开启使用Schema Registry(如 Confluent Schema Registry)管理 schema 版本,kafka有类似的服务,相当于写入数据和读取数据的时候自动获取对应版本的schema去处理数据,这样即使版本不同也不影响(没有测试用过)
重点说方案1 配置的avro文件如下
{
"namespace": "com.test",
"type": "record",
"name": "avroPojo",
"fields": [
{
"name": "time",
"type": "long",
"default": 0,
"doc": "时间"
},
{
"name": "id",
"type": "string",
"default": "",
"doc": "id"
}
]
}
采用自定义反序列化器--实现向后兼容
/*
按照avro包的{@link AvroDeserializationSchema} 重写了deserialize方法
*/
public class AvroCompatibleDeserializationSchema<T> implements DeserializationSchema<T> {
public static AvroCompatibleDeserializationSchema<GenericRecord> forGeneric(Schema schema) {
return new AvroCompatibleDeserializationSchema<>(GenericRecord.class, schema);
}
public static <T extends SpecificRecord> AvroCompatibleDeserializationSchema<T> forSpecific(Class<T> tClass) {
return new AvroCompatibleDeserializationSchema<>(tClass, null);
}
private static final long serialVersionUID = -6766681879020862312L;
private final Class<T> recordClazz;
private final String schemaString;
private transient GenericDatumReader datumReader;
private transient MutableByteArrayInputStream inputStream;
private transient Decoder decoder;
private transient Schema reader;
/**
* Creates a Avro deserialization schema.
*
* @param recordClazz class to which deserialize. Should be one of:
* {@link SpecificRecord},
* {@link GenericRecord}.
* @param reader reader's Avro schema. Should be provided if recordClazz is
* {@link GenericRecord}
*/
AvroCompatibleDeserializationSchema(Class<T> recordClazz, @Nullable Schema reader) {
Preconditions.checkNotNull(recordClazz, "Avro record class must not be null.");
this.recordClazz = recordClazz;
this.reader = reader;
if (reader != null) {
this.schemaString = reader.toString();
} else {
this.schemaString = null;
}
}
GenericDatumReader<T> getDatumReader() {
return datumReader;
}
Schema getReaderSchema() {
return reader;
}
MutableByteArrayInputStream getInputStream() {
return inputStream;
}
Decoder getDecoder() {
return decoder;
}
@Override
public T deserialize(byte[] message) throws IOException {
if (message == null) {
return null;
} else {
// read record
checkAvroInitialized(); // 反序列化方法
inputStream.setBuffer(message);
Schema readerSchema = getReaderSchema();
GenericDatumReader<T> datumReader = getDatumReader();
datumReader.setSchema(readerSchema);
return datumReader.read(null, decoder);
}
}
// 该方法时avro自定义反序列化的处理逻辑
void checkAvroInitialized() {
if (datumReader != null) {
return;
}
ClassLoader cl = Thread.currentThread().getContextClassLoader();
if (SpecificRecord.class.isAssignableFrom(recordClazz)) {
SpecificData specificData = new SpecificData(cl);
this.reader = AvroFactory.extractAvroSpecificSchema(recordClazz, specificData);
// 重点来了,调CompatibleGenericDatumReader()
this.datumReader = new CompatibleGenericDatumReader<>(null, this.reader, specificData);
} else {
this.reader = new Schema.Parser().parse(schemaString);
GenericData genericData = new GenericData(cl);
this.datumReader = new CompatibleGenericDatumReader<>(null, this.reader, genericData);
}
this.inputStream = new MutableByteArrayInputStream();
this.decoder = DecoderFactory.get().directBinaryDecoder(inputStream, null);
}
@Override
public boolean isEndOfStream(T nextElement) {
return false;
}
@Override
@SuppressWarnings("unchecked")
public TypeInformation<T> getProducedType() {
if (SpecificRecord.class.isAssignableFrom(recordClazz)) {
return new AvroTypeInfo(recordClazz);
} else {
return (TypeInformation<T>) new GenericRecordAvroTypeInfo(this.reader);
}
}
}
重点逻辑CompatibleGenericDatumReader()
public class CompatibleGenericDatumReader<T> extends GenericDatumReader<T> {
private static final Logger LOGGER = LoggerFactory.getLogger(CompatibleGenericDatumReader.class);
public CompatibleGenericDatumReader(Schema writer, Schema reader, GenericData data) {
super(writer, reader, data);
}
public CompatibleGenericDatumReader(Schema schema) {
super(schema);
}
@Override
protected void readField(Object r, Schema.Field f, Object oldDatum, ResolvingDecoder in, Object state) throws IOException {
try {
super.readField(r, f, oldDatum, in, state);
} catch (EOFException e) {
// 忽略缺少字段的错误,并且将默认值附上
// 调父类GenericDatumReader的getData()和setField()
getData().setField(r, f.name(), f.pos(), f.defaultVal());
}
}
}
2.process算子的泛型擦除问题
报错信息如下:
public class ProcessTest1 extends ProcessFunction<Person> implements ResultTypeQueryable<Person> {
private static final Logger LOGGER = LoggerFactory.getLogger(ProcessTest1.class);
@Override
public boolean process(Person click, ProcessFunction<Person, Person>.Context ctx, Collector<Person> out) {
。。。。
}
@Override
public TypeInformation<Person> getProducedType() {
return TypeInformation.of(Person.class);
}
}
3.avro生成的类,要进行独特的schema去序列化和反序列化,如果schema是static修饰,有问题
数据如下
{
"name":"张三“,
"age":18,
"score": 77
}
序列化后的byte数组如下图
明显看出,这个序列化出问题了,代码如下
private static AvroSerializationSchema<Person> adPvSchema = AvroSerializationSchema.forSpecific(Person.class);
private static AvroSerializationSchema<Person> creativePvSchema = AvroSerializationSchema.forSpecific(Person.class);
process(){
...
Person person = builder.build();
byte[] adpv = adPvSchema.serialize(person);
}
可以发现这里面的序列化schema是静态变量,问题如下
-
avro类的序列化需要上下文支持,这并不能在类加载阶段完成初始化,那么,如果schema没有正确初始化,就会默认使用其他的序列化,这就会导致序列化出问题。
-
静态变量在多并发情况下,可能有线程安全问题,静态序列化器在多线程环境下共享同一实例,导致编码缓冲区竞争:典型错误表现为
ArrayIndexOutOfBoundsException或二进制数据校验失败 -
Avro序列化器需要管理如内存池、线程局部变量等资源,静态修饰会阻碍GC正常回收:长时间运行后引发
OutOfMemoryError -
当使用Confluent Schema Registry时,静态序列化器无法动态获取最新Schema ID:导致Schema兼容性检查失败,抛出
IncompatibleSchemaException
解决方案:把序列化器和反序化器改为非静态
4.flink的Bug关于ConnectedStreams的keyBy(0,0)的
public class Flink07Connect_InnerJoin02 {
public static void main(String[] args) throws Exception {
//该案例演示了通过connect算子实现内连接效果
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<Tuple2<Integer, String>> ds1 = env.fromElements(
Tuple2.of(1, "a1"),
Tuple2.of(1, "a2"),
Tuple2.of(2, "b"),
Tuple2.of(3, "c")
);
DataStreamSource<Tuple3<Integer, String,Integer>> ds2 = env.fromElements(
Tuple3.of(1, "aa1", 1),
Tuple3.of(1, "aa2", 2),
Tuple3.of(2, "bb", 1),
Tuple3.of(3, "cc", 1)
);
ConnectedStreams<Tuple2<Integer, String>, Tuple3<Integer, String, Integer>> connect = ds1.connect(ds2);
ConnectedStreams<Tuple2<Integer, String>, Tuple3<Integer, String, Integer>> keyByCon = connect
.keyBy(0, 0);
SingleOutputStreamOperator<String> process = keyByCon.process(new KeyedCoProcessFunction<Integer, Tuple2<Integer, String>, Tuple3<Integer, String, Integer>, String>() {
Map<Integer, List<Tuple2<Integer, String>>> cache1 = new HashMap<>();
Map<Integer, List<Tuple3<Integer, String, Integer>>> cache2 = new HashMap<>();
@Override
public void processElement1(Tuple2<Integer, String> value, Context ctx, Collector<String> out) throws Exception {
Integer id = ctx.getCurrentKey();
if (cache1.containsKey(id)) {
cache1.get(id).add(value);
} else {
List<Tuple2<Integer, String>> tmpList = new ArrayList<>(); tmpList.add(value);
cache1.put(id, tmpList);
}
if (cache2.containsKey(id)) {
for (Tuple3<Integer, String, Integer> tup3 : cache2.get(id)) {
out.collect(value + "____" + tup3);
}
}
}
@Override
public void processElement2(Tuple3<Integer, String, Integer> value, Context ctx, Collector<String> out) throws Exception {
Integer id = ctx.getCurrentKey();
if (cache2.containsKey(id)) {
cache2.get(id).add(value);
} else {
List<Tuple3<Integer, String, Integer>> tmpList = new ArrayList<>();
tmpList.add(value);
cache2.put(id, tmpList);
}
if (cache1.containsKey(id)) {
for (Tuple2<Integer, String> tup2 : cache1.get(id)) {
out.collect(tup2 + "____" + value);
}
}
}
});
process.print();
env.execute();
}
}
ConnectedStreams<Tuple2<Integer, String>, Tuple3<Integer, String, Integer>> keyByCon = connect
.keyBy(0, 0);
这个会产生的是tuple1,但是我们下面的ctx.getCurrentKey();返回的类型是Integer,而我们又没办法调用这个tuple1的t0就导致我们获取不到,并且也没办法转类型。
解决方案:使用keySelector,不要使用(0,0)
5.主流connect广播流处理函数报错:没序列化
报错:The implementation of the BroadcastProcessFunction is not serializable. The implementation accesses fields of its enclosing class, which is a common reason for non-serializability. A common solution is to make the function a proper (non-inner) class, or a static inner class.
当你提到 BroadcastProcessFunction 实现不是可序列化的,并且它访问了其封闭类的字段,这是一个常见的非序列化原因。通常情况下,解决方法是将这个函数实现改为一个非内部类(即独立的类),或者静态内部类。
以下是你可以尝试的一些步骤来解决这个问题:
-
将
BroadcastProcessFunction的实现改为非内部类或静态内部类。这样做的原因是非静态内部类持有了对外部类实例的引用,而这个引用使得类不能被正确地序列化。 -
如果你选择将其改为静态内部类,你需要确保该类不再持有对其外部类实例的引用。
-
如果修改类定义为静态内部类或非内部类之后仍然无法序列化,检查是否有其他成员变量也需要实现
Serializable接口。
通过以上步骤,你应该能够解决 BroadcastProcessFunction 实现不可序列化的问题。如果你需要具体的代码示例或进一步的帮助,请提供更多细节。
public class DwdBaseTest extends BaseAPP {
合并流.process(new BroadcastProcessFunction<JSONObject, TableProcessDwd, Tuple2<JSONObject, TableProcessDwd>>() {
Map<String, TableProcessDwd> cacheMap = new HashMap<>();
@Override
public void open(Configuration parameters) throws Exception {
Class.forName(Constant.MYSQL_DRIVER);
//获取连接
java.sql.Connection conn = DriverManager.getConnection(Constant.MYSQL_URL, Constant.MYSQL_USER_NAME, Constant.MYSQL_PASSWORD);
//创建数据库操作对象
String sql = "select * from gmall_config.table_process_dwd";
PreparedStatement ps = conn.prepareStatement(sql);
//执行sql语句
ResultSet rs = ps.executeQuery();
//处理结果集
ResultSetMetaData metaData = rs.getMetaData();//元数据里有数据的条数
while (rs.next()) {
//定义一个json对象 用于封装查询出来的一条结果
JSONObject jsonObj = new JSONObject();
for (int i = 1; i <= metaData.getColumnCount(); i++) {
String columnName = metaData.getColumnName(i);
Object columnValue = rs.getObject(i);
jsonObj.put(columnName, columnValue);
}
//将json转换为实体类对象
TableProcessDwd tableProcessDwd = jsonObj.toJavaObject(TableProcessDwd.class);
cacheMap.put(tableProcessDwd.getSourceTable(), tableProcessDwd);
}
//释放资源
rs.close();
ps.close();
conn.close();
}
@Override
public void processElement(JSONObject jsonObject, BroadcastProcessFunction<JSONObject, TableProcessDwd, Tuple2<JSONObject, TableProcessDwd>>.ReadOnlyContext readOnlyContext, Collector<Tuple2<JSONObject, TableProcessDwd>> collector) throws Exception {
ReadOnlyBroadcastState<String, TableProcessDwd> broadcastState = readOnlyContext.getBroadcastState(broadCastDescriptor);
String table = jsonObject.getString("table");
String type = jsonObject.getString("type");
String key = getKey(table, type);
TableProcessDwd tableProcessDwd = broadcastState.get(key) == null ? cacheMap.get(key) : broadcastState.get(key);
JSONObject data = jsonObject.getJSONObject("data");
if (tableProcessDwd != null) {
String sinkColumns = tableProcessDwd.getSinkColumns();
deleteColumnTest(data, sinkColumns);
data.put("ts", jsonObject.getLong("ts"));
collector.collect(Tuple2.of(data, tableProcessDwd));
}
}
@Override
public void processBroadcastElement(TableProcessDwd tableProcessDwd, BroadcastProcessFunction<JSONObject, TableProcessDwd, Tuple2<JSONObject, TableProcessDwd>>.Context context, Collector<Tuple2<JSONObject, TableProcessDwd>> collector) throws Exception {
BroadcastState<String, TableProcessDwd> broadcastState = context.getBroadcastState(broadCastDescriptor);
String op = tableProcessDwd.getOp();
String sourceTable = tableProcessDwd.getSourceTable();
String sourceType = tableProcessDwd.getSourceType();
String key = getKey(sourceTable, sourceType);
if ("d".equals(op)) {
broadcastState.remove(key);
cacheMap.remove(key);
} else {
broadcastState.put(key, tableProcessDwd);
cacheMap.put(key, tableProcessDwd);
}
}
})
//下面的方法是类中声明的
public void deleteColumnTest(JSONObject jsonObject, String sinkColumns) {
List<String> needColumnList = Arrays.asList(sinkColumns.split(","));
Set<Map.Entry<String, Object>> entries = jsonObject.entrySet();
entries.removeIf(et->!needColumnList.contains(et.getKey()));
}
public String getKey(String table,String type){
return table+":"+type;
}
}
引起问题的原因:是因为我们声明了两个方法,但是在BroadcastProcessFunction中调用这两个方法,会导致序列化问题。
解决方案
方法1:把BroadcastProcessFunction单独抽取出来一个类,这个类implement BroadcastProcessFunction 然后在这里面写业务逻辑,调用:new 这个类
方法2:把BroadcastProcessFunction里调用的两个方法设置为static即可
//下面的方法是类中声明的
public static void deleteColumnTest(JSONObject jsonObject, String sinkColumns) {
List<String> needColumnList = Arrays.asList(sinkColumns.split(","));
Set<Map.Entry<String, Object>> entries = jsonObject.entrySet();
entries.removeIf(et->!needColumnList.contains(et.getKey()));
}
public static String getKey(String table,String type){
return table+":"+type;
}
方法3:把要调用的方法放到BroadcastProcessFunction里面不用声明static也可以
合并流.process(new BroadcastProcessFunction<JSONObject, TableProcessDwd, Tuple2<JSONObject, TableProcessDwd>>() {
Map<String, TableProcessDwd> cacheMap = new HashMap<>();
@Override
public void open(Configuration parameters) throws Exception {
。。。。
}
@Override
public void processElement(JSONObject jsonObject, BroadcastProcessFunction<JSONObject, TableProcessDwd, Tuple2<JSONObject, TableProcessDwd>>.ReadOnlyContext readOnlyContext, Collector<Tuple2<JSONObject, TableProcessDwd>> collector) throws Exception {
。。。。。
}
@Override
public void processBroadcastElement(TableProcessDwd tableProcessDwd, BroadcastProcessFunction<JSONObject, TableProcessDwd, Tuple2<JSONObject, TableProcessDwd>>.Context context, Collector<Tuple2<JSONObject, TableProcessDwd>> collector) throws Exception {
。。。。。
}
//下面的BroadcastProcessFunction方法是方法中声明的
public void deleteColumnTest(JSONObject jsonObject, String sinkColumns) {
List<String> needColumnList = Arrays.asList(sinkColumns.split(","));
Set<Map.Entry<String, Object>> entries = jsonObject.entrySet();
entries.removeIf(et->!needColumnList.contains(et.getKey()));
}
public String getKey(String table,String type){
return table+":"+type;
}
})
6.为什么flink的sql使用print()会阻塞后面的程序,而流使用print() 不会阻塞后面的程序
当在Flink SQL环境中使用print()时,它实际上是在Table API或SQL查询的结果上执行的一个操作。在这种情况下,print()操作通常作为终端操作(sink),这意味着数据流会在打印之后结束。因此,如果你有一个依赖于该查询后续操作的任务,那么它将会等待直到所有的记录都被处理完毕并打印出来,这可能导致程序看起来“阻塞”了。
在DataStream API中使用print()方法时,它是作为一个转换操作添加到流式处理流水线中的。这个操作并不会阻止流水线的其他部分继续处理数据。数据记录会被异步地发送到控制台输出,而流水线中的其他操作则可以继续处理新的记录。
7.Flink消费Kafka报日志:FETCH SESSION ID NOT FOUND
原因: 上游Kafka的分区是180个,而我们下游的处理并行度只设置了16个,也就是典型的消费不过来数据,因此,flink消费都费劲,导致kafka的broker的session.timeout超时未收到消费者的请求(也就是心跳),因此,报FETCH SESSION ID NOT FOUND
隐藏原因: 当消费者组内的消费者实例数量变化(如新增、移除),或 Kafka 主题的分区数变化时,Kafka 会触发 Rebalance机制,重新为每个消费者分配负责消费的分区。那么rebalance会导致重新分配分区;若新分配的任务尚未建立会话就拉取数据,会导致session id未注册而报错。
触发kafka消费者组的rebalance机制的原因如下:
- 消费者实例数量变化,比如flink增扩并行度,或任务异常重启;
- kafka主题分区数量发生变化;
- 消费会话超时,kafka认为该消费者已崩溃,触发rebalance移除该消费者;
解决方案:增大Flink的并行度至2-3倍,现在是48(16的3倍)
8.FlinkSQL写不进去Starrocks主键表
sr的表如下
CREATE TABLE `test` (
`click_id` varchar(65533) NOT NULL COMMENT "",
`day` date NOT NULL COMMENT "",
`hour` varchar(65533) NOT NULL COMMENT "",
`time` varchar(65533) NOT NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`click_id`)
DISTRIBUTED BY HASH(`click_id`)
PROPERTIES (
"compression" = "LZ4",
"enable_persistent_index" = "true",
"fast_schema_evolution" = "true",
"replicated_storage" = "true",
"replication_num" = "3"
);
而FlinkSQL代码如下
public class StarRocksInsertExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
// 1. Kafka Source
tableEnv.executeSql("Create table ods_search_ocpc(\n" +
" `click_id` STRING,\n" +
" `click_time` STRING\n" +
")\n" +
"WITH\n" +
" (\n" +
" 'connector' = 'kafka',\n" +
" 'topic'='test_topic',\n" +
" 'properties.group.id' = 'test',\n" +
" 'scan.startup.mode' = 'latest-offset',\n" +
" 'scan.topic-partition-discovery.interval' = '20000',\n" +
" -- Kafka Consumer 核心配置\n" +
" 'properties.enable.auto.commit' = 'true',\n" +
" 'properties.auto.commit.interval.ms' = '1000',\n" +
" 'properties.max.poll.records' = '500',\n" +
" -- JSON 格式容错(避免无效数据中断)\n" +
" 'format' = 'json',\n" +
" 'json.fail-on-missing-field' = 'false',\n" +
" 'json.ignore-parse-errors' = 'true'\n" +
" )");
// 2. StarRocks Sink
tableEnv.executeSql("CREATE TEMPORARY TABLE `dwd_search_ocpc` (\n" +
" `click_id` STRING,\n" +
" `day` DATE,\n" +
" `hour` STRING,\n" +
" `time` STRING,\n" +
" PRIMARY KEY (`click_id`) NOT ENFORCED \n" +
") WITH (\n" +
" 'connector' = 'starrocks',\n" +
" 'jdbc-url' = 'jdbc:mysql://SR_HOST:9030',\n" +
" 'load-url' = 'SR_HOST:8030',\n" +
" 'database-name' = 'sr_realtime',\n" +
" 'table-name' = 'test2',\n" +
" 'username' = 'zhangsan',\n" +
" 'password' = '',\n" +
" 'sink.properties.format' = 'json',\n" +
" 'sink.properties.strip_outer_array' = 'true'\n" +
")"
);
// 3. 写入数据(处理 NULL,适配非空字段)
tableEnv.executeSql("insert into dwd_search_ocpc\n" +
"select\n" +
" `click_id`,\n" +
" COALESCE(CAST(FROM_UNIXTIME(TRY_CAST(`click_time` AS BIGINT)/1000, 'yyyy-MM-dd') AS DATE), TO_DATE('1970-01-01')) as `day`,\n" +
" COALESCE(FROM_UNIXTIME(TRY_CAST(`click_time` AS BIGINT)/1000, 'HH'), '00') as `hour`,\n" +
" COALESCE(`click_time`, '') as `time`\n" +
"from ods_search_ocpc");
}
}
现象:kafka中有数据,但是写不到Starrocks的主键表,且代码也不报错,一直卡在消费那里
这就很奇怪了,既然kafka有数据,也消费到了,但是为啥写不进去sr主键表也不报错呢? 看官网从 Apache Flink® 持续导入 | StarRocks发现缺少一个很关键的配置'sink.properties.partial_update' = 'true',而我的flink-starrcoks连接器是1.2.9版本,因此,我就只配这个就行吧(心里暗喜)
但是,报了错误了
原因:主键表如果不设置部分更新,那么sr不知道该选哪个数据插入,但如果只配置了部分更新策略,不配置columns,FlinkConnector不知道需要列是哪几个字段,也没有数据写入
解决方案:同时配置下面的俩参数才能写入到sr主键表
'sink.properties.partial_update' = 'true',
'sink.properties.columns' = 'click_id,day,hour,time' // 必须要将全部字段都配置在这
9.不感知的FlinkJob重启,导致数据异常问题
现象:未开ck,然后每天看任务也是正常的,没有出现反压(只是看的那一刻没出现),但是最终导致写入到数据库的数据就不一致呢?
原因:当业务代码没问题,那么我们应该去看看是不是FlinkJob重启了呢?通过看集群监控(因为是session模式)的TM的内存监控,发现TM节点某一时刻挂了,然后换了一个新的节点,但由于我们没有开ck,那么这部分重启就会导致数据不一致问题
10.env和tableEnv是不可以同时执行的
报错:Caused by: org.apache.flink.util.FlinkRuntimeException: Cannot have more than one execute() or executeAsync() call in a single environment.
解决方案:必须要统一,要么用tableEnv执行,要么用env执行,不可重复执行
11.FlinkSQL如何执行多次DQL,或在for循环按顺序执行多次
解决方案:
用StreamTableEnvironment.createStatementSet()创建StatementSet对象
DDL(create):直接用tableEnv.executeSql()
DML|DQL(insert等):用StatementSet去将SQL按顺序调addInsertSql()注册起来,后面在main方法中用stmtSet.execute()去执行
12.针对不清楚当前流的数据类型,但仍想用OutputTag
解决方案:
先用获取当前流的数据类型的Class
Class<?> streamType = stream.getType().getTypeClass();
然后使用泛型 OutputTag,避免类型硬编码
OutputTag<T> outputTag = new OutputTag<T>(name, TypeInformation.of(streamType)) {};