在进行flume数据部署验证的时候,除了使用telnet 或者nc 客户端进行数据发送获取外,还可以通过flume自带的 Avro客户端进行数据发送获取。
一、Flume avro-client介绍
Flume发行版中包含的Avro客户端可以使用avro RPC机制将给定文件发送到Flume Avro Source。 简单的示例:
$ bin/flume-ng avro-client -H localhost -p 41414 -F /usr/logs/log.10
上面的命令会将/usr/logs/log.10的内容发送到监听该端口的Flume Source。
前提是要启动avro的agent, 下面是简单的配置 avro_memeory_logger.conf
a1.sources = src1
a1.channels = ch1
a1.sinks = sk1
# a1下有一个sources叫做src1 的类型是avro
a1.sources.src1.type = avro
a1.sources.src1.bind = localhost
a1.sources.src1.port = 6666
a1.channels.ch1.type = memory
a1.channels.ch1.capacity = 1000
a1.channels.ch1.transactionCapacity = 100
a1.sinks.sk1.type = logger
a1.sources.src1.channels = ch1
a1.sinks.sk1.channel = ch1
启动agent,用于监听端口 6666:
[root@node6 apache-flume-1.9.0-bin]# bin/flume-ng agent --conf conf --conf-file conf/avro-memeory-logger.conf --name a1 --property flume.root.logger=INFO,console
二、Avro代码解析
1.shell脚本入口
[root@node6 apache-flume-1.9.0-bin]#vim bin/flume-ng
################################
# constants
################################
#设置常量值,主要是针对不同的参数执行相应的类,以启动Flume环境
FLUME_AGENT_CLASS="org.apache.flume.node.Application"
#avro-client客户端启动类,例:flume-ng avro-client -c conf -H localhost -p 4141 -F /usr/local/hadoop/log.00
FLUME_AVRO_CLIENT_CLASS="org.apache.flume.client.avro.AvroCLIClient"
FLUME_VERSION_CLASS="org.apache.flume.tools.VersionInfo"
FLUME_TOOLS_CLASS="org.apache.flume.tools.FlumeToolsMain"
其中FLUME_AVRO_CLIENT_CLASS="org.apache.flume.client.avro.AvroCLIClient"定义启动类。该类在flume-ng-core模块的 org.apache.flume.client.avro包下:
flume-ng 工具的最后会根据参数启动具体的入口类:
# 根据不同的参数,执行不同的启动类,每个常量所对应的类路径在代码前面有过介绍。
# finally, invoke the appropriate command
if [ -n "$opt_agent" ] ; then
run_flume $FLUME_AGENT_CLASS $args
elif [ -n "$opt_avro_client" ] ; then
run_flume $FLUME_AVRO_CLIENT_CLASS $args
elif [ -n "${opt_version}" ] ; then
run_flume $FLUME_VERSION_CLASS $args
elif [ -n "${opt_tool}" ] ; then
run_flume $FLUME_TOOLS_CLASS $args
else
error "This message should never appear" 1
fi
# 成功执行返回 0, shell脚本中 非0一般代表失败
exit 0
2.AvroCLIClient类分析
位置在源码:org.apache.flume.client.avro.AvroCLIClient ,类的成员结构如下:
在上面添加了2个注解:
@InterfaceAudience.Private
@InterfaceStability.Evolving
主要看 main,parseCommandLine, parseHeaders, run函数。
public static void main(String[] args) {
SSLUtil.initGlobalSSLParameters();
AvroCLIClient client = new AvroCLIClient();
try {
//使用parseCommandLine进行参数解析
if (client.parseCommandLine(args)) {
client.run();
}
} catch (ParseException e) {
logger.error("Unable to parse command line options - {}", e.getMessage());
} catch (IOException e) {
logger.error("Unable to send data to Flume. Exception follows.", e);
} catch (FlumeException e) {
logger.error("Unable to open connection to Flume. Exception follows.", e);
} catch (EventDeliveryException e) {
logger.error("Unable to deliver events to Flume. Exception follows.", e);
}
logger.debug("Exiting");
}
我们查看 parseCommandLine函数(留意注释便可) 主要使用 Apache Commons CLI 解析命令行输入(具体使用方式,有兴趣深入学习的可以参考:commons.apache.org/components.…
private boolean parseCommandLine(String[] args) throws ParseException {
//使用 Apache Commons CLI 解析命令行输入
Options options = new Options();
options
.addOption("P", "rpcProps", true, "RPC client properties file with " +
"server connection params")
.addOption("p", "port", true, "port of the avro source")
.addOption("H", "host", true, "hostname of the avro source")
.addOption("F", "filename", true, "file to stream to avro source")
.addOption(null, "dirname", true, "directory to stream to avro source")
.addOption("R", "headerFile", true, ("file containing headers as " +
"key/value pairs on each new line"))
.addOption("h", "help", false, "display help text");
// 可以用 flume-ng avro-client 的 R,--headerFile 选项,
//用一个带有k,v的header文件把参数信息传递到上层的Flume
// 这样在上层的flume agent中的source channel, sink conf配置中可以通过 %{参数名引用} -- 2022/11/11
CommandLineParser parser = new GnuParser();
CommandLine commandLine = parser.parse(options, args);
if (commandLine.hasOption('h')) {
new HelpFormatter().printHelp("flume-ng avro-client", "", options,
"The --dirname option assumes that a spooling directory exists " +
"where immutable log files are dropped.", true);
return false;
}
if (commandLine.hasOption("filename") && commandLine.hasOption("dirname")) {
throw new ParseException(
"--filename and --dirname options cannot be used simultaneously");
}
if (!commandLine.hasOption("port") && !commandLine.hasOption("host") &&
!commandLine.hasOption("rpcProps")) {
throw new ParseException("Either --rpcProps or both --host and --port " +
"must be specified.");
}
if (commandLine.hasOption("rpcProps")) {
rpcClientPropsFile = commandLine.getOptionValue("rpcProps");
Preconditions.checkNotNull(rpcClientPropsFile, "RPC client properties " +
"file must be specified after --rpcProps argument.");
Preconditions.checkArgument(new File(rpcClientPropsFile).exists(),
"RPC client properties file %s does not exist!", rpcClientPropsFile);
}
if (rpcClientPropsFile == null) {
if (!commandLine.hasOption("port")) {
throw new ParseException(
"You must specify a port to connect to with --port");
}
port = Integer.parseInt(commandLine.getOptionValue("port"));
if (!commandLine.hasOption("host")) {
throw new ParseException(
"You must specify a hostname to connect to with --host");
}
hostname = commandLine.getOptionValue("host");
}
fileName = commandLine.getOptionValue("filename");
dirName = commandLine.getOptionValue("dirname");
if (commandLine.hasOption("headerFile")) {
// avro client传递参数给下游flume agent, 格式是key=value
parseHeaders(commandLine);
}
return true;
}
对于需要传递参数给下游flume agent的场景,使用 parseHeaders进行解析(具体传递参数的示例后面会做演示)。
/*
* 读取需要传递给下游 flume agent的参数,用于下游conf中
* Header Format : key1=value1, key2=value2,...
*/
private void parseHeaders(CommandLine commandLine) {
String headerFile = commandLine.getOptionValue("headerFile");
FileInputStream fs = null;
try {
if (headerFile != null) {
fs = new FileInputStream(headerFile);
Properties properties = new Properties();
properties.load(fs);
for (Map.Entry<Object, Object> propertiesEntry : properties.entrySet()) {
String key = (String) propertiesEntry.getKey();
String value = (String) propertiesEntry.getValue();
logger.debug("Inserting Header Key [" + key + "] header value [" + value + "]");
headers.put(key, value);
}
}
} catch (Exception e) {
logger.error("Unable to load headerFile", headerFile, e);
return;
} finally {
if (fs != null) {
try {
fs.close();
} catch (Exception e) {
logger.error("Unable to close headerFile", e);
return;
}
}
}
}
最后调用run方法进行数据传递发送, rpcClient = RpcClientFactory.getInstance(new File(rpcClientPropsFile)) 构造一个rpcClient,对于Event Header与body进行构建后,最后使用rpcClient.appendBatch(events)进行事件发送。
private void run() throws IOException, FlumeException,
EventDeliveryException {
EventReader reader = null;
// 构造rpcClient用于发送信息
RpcClient rpcClient;
if (rpcClientPropsFile != null) {
rpcClient = RpcClientFactory.getInstance(new File(rpcClientPropsFile));
} else {
rpcClient = RpcClientFactory.getDefaultInstance(hostname, port,
BATCH_SIZE);
}
try {
if (fileName != null) {
reader = new SimpleTextLineEventReader(new FileReader(new File(fileName)));
} else if (dirName != null) {
reader = new ReliableSpoolingFileEventReader.Builder()
.spoolDirectory(new File(dirName))
.sourceCounter(new SourceCounter("avrocli"))
.build();
} else {
reader = new SimpleTextLineEventReader(new InputStreamReader(System.in));
}
long lastCheck = System.currentTimeMillis();
long sentBytes = 0;
int batchSize = rpcClient.getBatchSize();
List<Event> events;
// 构造Event
while (!(events = reader.readEvents(batchSize)).isEmpty()) {
for (Event event : events) {
// 对事件头进行设置
event.setHeaders(headers);
// 构建事件体body
sentBytes += event.getBody().length;
sent++;
long now = System.currentTimeMillis();
if (now >= lastCheck + 5000) {
logger.debug("Packed {} bytes, {} events", sentBytes, sent);
lastCheck = now;
}
}
// 发送Event事件, 需要处理 EventDeliveryException 异常
rpcClient.appendBatch(events);
if (reader instanceof ReliableEventReader) {
((ReliableEventReader) reader).commit();
}
}
logger.debug("Finished");
} finally {
if (reader != null) {
logger.debug("Closing reader");
reader.close();
}
logger.debug("Closing RPC client");
rpcClient.close();
}
}
三、Avro Client 发送使用 --headerFile 选项
1、agent配置avro_hdfs.conf
# 各组件名字配置
a1.sources = s1
a1.sinks = k1
a1.channels = c1
## source配置
a1.sources.s1.type = avro
a1.sources.s1.bind = 192.168.0.86
a1.sources.s1.port = 44444
# channel配置
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000
# sink 配置, 需要avro client传递3个参数 type_name, node, timestamp
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node6:8020/tmp/flume/%{type_name}/%Y-%m-%d/%H/%M
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
# 配置文件前缀
a1.sinks.k1.hdfs.filePrefix = %{node}_%{timestamp}
# 配置文件后缀
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.batchSize = 1000
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.idleTimeout = 30
#a1.sinks.k1.hdfs.useLocalTimeStamp = true
#source与sink指定绑定的channel
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
上面的配置中,需要avro client传递3个参数 type_name, node, timestamp。
在conf目录下 建立 header-file.conf,内容如下:
timestamp=1668153093
type_name=nginx
node=node6
2、启动agent
[root@node6 apache-flume-1.9.0-bin]#bin/flume-ng agent --conf conf --conf-file conf/avro_hdfs.conf --name a1 -Dflume.monitoring.type=http -Dflume.monitoring.port=44441 -Dflume.root.logger=INFO,console
3、使用avro-client发送带参数的数据
[root@node6 apache-flume-1.9.0-bin]# bin/flume-ng avro-client -H 192.168.0.86 -p 44444 -F /var/log/boot.log-20220208 -R conf/header-file.conf
在flume的agent端查看日志:
可以看到已经创建了文件:/tmp/flume/nginx/1970-01-20/15/22/node6_1668153093.1668153232675.log
[root@node6 ~]# hdfs dfs -ls hdfs://node6:8020/tmp/flume/nginx/1970-01-20/15/22/
(header-file.conf配置文件中的timestamp 配置的有些问题,所以年份是1970-01-20)
实验成功!