数据流:Taildir Source -> Memory Channel -> HDFS Sink
配置文件:
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = TAILDIR
# 记录每个文件的 inode、读取偏移量以及文件绝对路径的 JSON 文件
a1.sources.r1.positionFile = /home/admin/taildir/taildir.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /home/admin/file/.*txt.*
a1.sources.r1.filegroups.f2 = /home/admin/logs/.*log.*
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flumeTail/%Y%m%d/%H
a1.sinks.k1.hdfs.filePrefix = tail
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = hour
a2.sinks.k1.hdfs.rollInterval = 60
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.batchSize = 100
a1.sinks.k1.hdfs.fileType = DataStream
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
监控的目录要提前创建好:
~ mkdir -p /home/admin/file /home/admin/logs
运行:
~ flume-ng agent -n a1 -c conf -f conf/taildir-memory-hdfs.conf
准备数据:
~ cp 1.log 2.log logs/
~ cp 3.txt 4.txt file/
Flume 日志:
2023-02-24 03:21:18,946 INFO hdfs.BucketWriter: Creating /flumeTail/20230224/03/tail.1677180078858.tmp
2023-02-24 03:21:49,545 INFO hdfs.HDFSEventSink: Writer callback called.
2023-02-24 03:21:49,545 INFO hdfs.BucketWriter: Closing /flumeTail/20230224/03/tail.1677180078858.tmp
2023-02-24 03:21:49,606 INFO hdfs.BucketWriter: Renaming /flumeTail/20230224/03/tail.1677180078858.tmp to /flumeTail/20230224/03/tail.1677180078858