「这是我参与11月更文挑战的第3天,活动详情查看:2021最后一次更文挑战」。
目标
通过Flink 自带的 ElasticSearch Connector,将 Kafka 中的数据经过 Flink 处理后然后存储到 ElasticSearch。
pom
es的版本根据自己实际使用的版本进行配置,我这里线上使用的是es7;
kafka根据自己使用的scala版本进行配置;
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-kafka_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch7_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
ES 配置
地址解析
可以自己封装一个地址解析的方法
public static HttpHost[] loadHostArray(String nodes) {
if (httpHostArray == null) {
String[] split = nodes.split(",");
httpHostArray = new HttpHost[split.length];
for(int i = 0; i < split.length; ++i) {
String item = split[i];
httpHostArray[i] = new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http");
}
}
return httpHostArray;
}
如果只是测试,可如下:
List<HttpHost> elsearchHosts = new ArrayList<>();
elsearchHosts.add(new HttpHost("127.0.0.1", 9200, "http"));
创建 ElasticsearchSink.Builder
配置详解:
- esSinkBuilder.setBulkFlushInterval(3000);//批量写入的时间间隔,如果设置,则忽略下面两个批量写入配置
- esSinkBuilder.setBulkFlushMaxSizeMb(10);//批量写入时的最大数据量
- esSinkBuilder.setBulkFlushMaxActions(1);//批量写入时的最大写入条数
- esSinkBuilder.setBulkFlushBackoff(true);//是否开启重试机制
- esSinkBuilder.setBulkFlushBackoffRetries(2);//失败重试次数 失败策略配置一个即可:
- esSinkBuilder.setFailureHandler(new RetryRejectedExecutionFailureHandler());//默认失败重试
- esSinkBuilder.setFailureHandler;//重写失败策略,在es磁盘满的情况下还可以写入
ElasticsearchSink.Builder<JSONObject> esSinkBuilder = new ElasticsearchSink.Builder<>(elsearchHosts, new ElasticsearchSinkFunction<JSONObject>() {
@Override
public void process(JSONObject jsonObject, RuntimeContext runtimeContext, RequestIndexer requestIndexer) {
new ElasticsearchSinkFunction<JSONObject>() {
private String INDEX = "test";
@Override
public void process(JSONObject jsonObject, RuntimeContext runtimeContext, RequestIndexer requestIndexer) {
//如果尚未存在,则表明必须将部分文档用作upsert文档
UpdateRequest updateRequest = new UpdateRequest(INDEX, jsonObject.getString("id"));
updateRequest.upsert(jsonObject, XContentType.JSON);
updateRequest.doc(jsonObject, XContentType.JSON);
//添加请求
requestIndexer.add(updateRequest);
}
};
}
});
esSinkBuilder.setFailureHandler(new ActionRequestFailureHandler() {
@Override
public void onFailure(ActionRequest actionRequest, Throwable throwable, int i, RequestIndexer requestIndexer) throws Throwable {
if (ExceptionUtils.findThrowable(throwable, EsRejectedExecutionException.class).isPresent()) {
// full queue; re-add document for indexing
requestIndexer.add(actionRequest);
} else if (ExceptionUtils.findThrowable(throwable, ElasticsearchParseException.class).isPresent()) {
// malformed document; simply drop request without failing sink
} else {
// for all other failures, fail the sink
// here the failure is simply rethrown, but users can also choose to throw custom exceptions
//throw failure
//logger.error(throwable.getMessage());
}
}
});
esSinkBuilder.setBulkFlushInterval(3000);
esSinkBuilder.setBulkFlushMaxSizeMb(10);
esSinkBuilder.setBulkFlushBackoff(true);
esSinkBuilder.setBulkFlushBackoffRetries(2);
esSinkBuilder.setBulkFlushMaxActions(1);
esSinkBuilder.setFailureHandler(new RetryRejectedExecutionFailureHandler());
sink
mapStream.addSink(esSinkBuilder.build()).name("sink");
kafka配置
参数
Properties properties=new Properties();
properties.setProperty("bootstrap.servers","127.0.0.1:9092");
properties.setProperty("group.id","test");
properties.setProperty("auto.offset.reset","latest");
properties.setProperty("flink.partition-discovery.interval-millis","5000");
properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("enable.auto.commit","true");
properties.setProperty("auto.commit.interval.ms","5000");
数据读取
自定义Schema
BinLogPojo可替换为自己的实体类
public class BinlogSchema implements DeserializationSchema<BinLogPojo> {
@Override
public BinLogPojo deserialize(byte[] bytes) {
String log = new String(bytes, StandardCharsets.UTF_8);
return JSON.parseObject(log, BinLogPojo.class);
}
@Override
public boolean isEndOfStream(BinLogPojo binLogPojo) {
return false;
}
@Override
public TypeInformation<BinLogPojo> getProducedType() {
return TypeInformation.of(new TypeHint<BinLogPojo>(){
@Override
public TypeInformation<BinLogPojo> getTypeInfo() {
return super.getTypeInfo();
}
});
}
}
消费kafka
这里有一个注意点,就是不能直接返回一个JSONObject流,这样sink的时候就会报java.lang.UnsupportedOperationException错误,所以建议用String或者自己封装一个序列化的实体类
FlinkKafkaConsumerBase<BinLogPojo> eventConsumer = new FlinkKafkaConsumer<>(
TOPIC, new BinlogSchema(), properties)
.setStartFromLatest();
SingleOutputStreamOperator<JSONObject> mapStream = env.addSource(eventConsumer)
.map(new MapFunction<BinLogPojo, JSONObject>() {
@Override
public JSONObject map(BinLogPojo binLog) throws Exception {
LinkedHashMap<String, String> binLogData = binLog.getData();
JSONObject jsonObject = new JSONObject();
jsonObject.put("status", 0);
return jsonObject;
}
});