Flume 案例:监控目录下的新增文件写入 HDFS

163 阅读2分钟

数据流:Spooling Directory Source -> Memory Channel -> HDFS Sink

配置文件:

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/admin/flumeSpool
# 文件读取完成后添加的后缀
a1.sources.r1.fileSuffix = .COMPLETED
# 是否在 event header 中添加文件的绝对路径
a1.sources.r1.fileHeader = true
# 忽略以 .tmp 结尾的文件
a1.sources.r1.ignorePattern = ([^ ]*\.tmp)

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flumeSpool/%Y%m%d/%H
a1.sinks.k1.hdfs.filePrefix = spool
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = hour
a2.sinks.k1.hdfs.rollInterval = 60
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.batchSize = 100
a1.sinks.k1.hdfs.fileType = DataStream

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

监控的目录需要提前创建好:

~ mkdir -p /home/admin/flumeSpool

运行:

~ flume-ng agent -n a1 -c conf -f conf/spooldir-memory-hdfs.conf

增加文件:

~ cp *.txt flumeSpool/

flume 日志:

2023-02-23 17:11:01,065 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,065 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/1.txt to /home/admin/flumeSpool/1.txt.COMPLETED
2023-02-23 17:11:01,069 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
2023-02-23 17:11:01,083 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,083 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/2.txt to /home/admin/flumeSpool/2.txt.COMPLETED
2023-02-23 17:11:01,091 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,091 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/3.txt to /home/admin/flumeSpool/3.txt.COMPLETED
2023-02-23 17:11:01,094 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,094 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/4.txt to /home/admin/flumeSpool/4.txt.COMPLETED
2023-02-23 17:11:01,165 INFO hdfs.BucketWriter: Creating /flumeSpool/20230223/17/spool.1677143461070.tmp
2023-02-23 17:11:31,711 INFO hdfs.HDFSEventSink: Writer callback called.
2023-02-23 17:11:31,711 INFO hdfs.BucketWriter: Closing /flumeSpool/20230223/17/spool.1677143461070.tmp
2023-02-23 17:11:31,774 INFO hdfs.BucketWriter: Renaming /flumeSpool/20230223/17/spool.1677143461070.tmp to /flumeSpool/20230223/17/spool.1677143461070

现象:本地文件合成 1 个上传到 HDFS,上传成功后本地文件增加了后缀:

~ ls
1.txt.COMPLETED  2.txt.COMPLETED  3.txt.COMPLETED  4.txt.COMPLETED