数据流:Spooling Directory Source -> Memory Channel -> HDFS Sink
配置文件:
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/admin/flumeSpool
# 文件读取完成后添加的后缀
a1.sources.r1.fileSuffix = .COMPLETED
# 是否在 event header 中添加文件的绝对路径
a1.sources.r1.fileHeader = true
# 忽略以 .tmp 结尾的文件
a1.sources.r1.ignorePattern = ([^ ]*\.tmp)
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flumeSpool/%Y%m%d/%H
a1.sinks.k1.hdfs.filePrefix = spool
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = hour
a2.sinks.k1.hdfs.rollInterval = 60
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.batchSize = 100
a1.sinks.k1.hdfs.fileType = DataStream
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
监控的目录需要提前创建好:
~ mkdir -p /home/admin/flumeSpool
运行:
~ flume-ng agent -n a1 -c conf -f conf/spooldir-memory-hdfs.conf
增加文件:
~ cp *.txt flumeSpool/
flume 日志:
2023-02-23 17:11:01,065 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,065 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/1.txt to /home/admin/flumeSpool/1.txt.COMPLETED
2023-02-23 17:11:01,069 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
2023-02-23 17:11:01,083 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,083 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/2.txt to /home/admin/flumeSpool/2.txt.COMPLETED
2023-02-23 17:11:01,091 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,091 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/3.txt to /home/admin/flumeSpool/3.txt.COMPLETED
2023-02-23 17:11:01,094 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2023-02-23 17:11:01,094 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/admin/flumeSpool/4.txt to /home/admin/flumeSpool/4.txt.COMPLETED
2023-02-23 17:11:01,165 INFO hdfs.BucketWriter: Creating /flumeSpool/20230223/17/spool.1677143461070.tmp
2023-02-23 17:11:31,711 INFO hdfs.HDFSEventSink: Writer callback called.
2023-02-23 17:11:31,711 INFO hdfs.BucketWriter: Closing /flumeSpool/20230223/17/spool.1677143461070.tmp
2023-02-23 17:11:31,774 INFO hdfs.BucketWriter: Renaming /flumeSpool/20230223/17/spool.1677143461070.tmp to /flumeSpool/20230223/17/spool.1677143461070
现象:本地文件合成 1 个上传到 HDFS,上传成功后本地文件增加了后缀:
~ ls
1.txt.COMPLETED 2.txt.COMPLETED 3.txt.COMPLETED 4.txt.COMPLETED