flume实现kafka的实时消息入hdfs

833 阅读1分钟

1、这里先启动hdfs,并在hdfs存储路径中新建一个目录(/flume)准备存放flume收集的kafka消息。

$ sbin/start-dfs.sh

2、然后启动kafka服务,并创建一个topic(flume-data),然后还可以启动一个生产者控制台,准备往flume-data这个topic中生产消息,让flume来消费。

start zookeeper(进入kafka安装目录)

$ bin/zookeeper-server-start.sh config/zookeeper.properties

start kafka-server

$ bin/kafka-server-start.sh config/server.properties

create topic flume-data

$ bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --replication-factor 1 --partitions 1 --topic TOPIC-BDC-RECOMMEND

setup kafka-console-producer

$ bin/kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic TOPIC-BDC-RECOMMEND

3、配置flume,并启动,等待kafka生产者发送消息。

config conf/flume.conf(进入flume安装目录)


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

agent.sources = kafkaSource
agent.channels = memoryChannel
agent.sinks = hdfsSink

# The channel can be defined as follows.
agent.sources.kafkaSource.channels = memoryChannel
agent.sources.kafkaSource.type=org.apache.flume.source.kafka.KafkaSource
agent.sources.kafkaSource.zookeeperConnect=172.20.0.11:2181,172.20.0.12:2181,172.20.0.14:2181
agent.sources.kafkaSource.topic=TOPIC-BDC-RECOMMEND
#agent.sources.kafkaSource.groupId=flume
agent.sources.kafkaSource.kafka.consumer.timeout.ms=100

agent.channels.memoryChannel.type=memory
agent.channels.memoryChannel.capacity=1000
agent.channels.memoryChannel.transactionCapacity=100

# the sink of hdfs
agent.sinks.hdfsSink.type=hdfs
agent.sinks.hdfsSink.channel = memoryChannel
agent.sinks.hdfsSink.hdfs.path=hdfs://192.168.70.3:9000/flume
agent.sinks.hdfsSink.hdfs.writeFormat=Text
agent.sinks.hdfsSink.hdfs.fileType=DataStream


start flume-ng

$ bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name agent -Dflume.root.logger=INFO,console

send message(请进入kafka的安装目录)

bin/kafka-console-producer.sh --broker-list 192.168.70.3:9092 --topic TOPIC-BDC-RECOMMEND

最后通过hdfs命令行查看生成的文件。

Last login: Sat Nov 19 13:15:16 2016 from 192.168.61.1
[root@master ~]# hadoop fs -ls /flume
Found 1 items
-rw-r--r--   2 root supergroup          4 2019-01-18 19:19 /flume/FlumeData.1547810311907
[root@master ~]# hadoop fs -cat /flume/FlumeData.1547810311907
1
2

当然你也可以选择使用webUI界面:

教程:blog.csdn.net/singgel/art…

Linux上安装hadoop集群:blog.csdn.net/pucao_cug/a…

Linux上安装kafka集群:blog.csdn.net/meepomiracl…

Linux上安装zookeeper集群:自己Google Allo去吧