kafka streams协同scala写一个wordcount

453 阅读3分钟

创建一个scala项目

使用sbt创建scala项目,这里我使用的是sbt打包uber-jar

scalasbt配置

build.sbt配置如下:

name := "learningKafka"

version := "0.1"

scalaVersion := "2.12.12"

libraryDependencies ++= Seq(
  "org.apache.kafka" %% "kafka" % "2.8.0",
  "org.apache.kafka" % "kafka-clients" % "2.8.0",
  "org.apache.kafka" % "kafka-streams" % "2.8.0",
  "org.apache.kafka" % "connect-api" % "2.8.0",
  "org.apache.avro" % "avro" % "1.10.2",
  "org.apache.kafka" %% "kafka-streams-scala" % "2.8.0",
  "org.slf4j" % "slf4j-api" % "1.7.30",
  "org.slf4j" % "slf4j-simple" % "1.7.30"
)

Compile / mainClass := Some("com.something.bz10.ConsumingApp")

assembly / assemblyJarName := "kafka_stream_example-v1.0.jar"

assembly / test := {}

assembly / mainClass := Some("com.something.bz10.ConsumingApp")

assembly / assemblyMergeStrategy := {
  case "META-INF/MANIFEST.MF" => MergeStrategy.discard
  case x => MergeStrategy.first
}

project/文件下创建一个build.properties文件:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")

创建工程文件

下面开始创建scala工程文件

main/scala/文件夹下创建一个包(package)名为com.something.bz10

之后创建文件ConsumingApp.scala

package com.liwenqiang.bz10

import java.util.Properties
import org.apache.kafka.streams.kstream.Materialized
import org.apache.kafka.streams.scala.ImplicitConversions._
import org.apache.kafka.streams.scala._
import org.apache.kafka.streams.scala.kstream._
import org.apache.kafka.streams.scala.serialization.Serdes.{longSerde, stringSerde}
import org.apache.kafka.streams.{KafkaStreams, StreamsConfig}

import java.time.Duration

object ConsumingApp extends App {

  // 添加配置文件
  val props: Properties = {
    val p = new Properties()
    p.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application")
    p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "node1:9092,node2:9092,node3:9092")
    p
  }

  // 创建拓扑
  val builder: StreamsBuilder = new StreamsBuilder
  val textLines: KStream[String, String] = builder.stream[String, String]("TextLinesTopic")
  val wordCounts: KTable[String, Long] = textLines
    .flatMapValues((textLine: String) => textLine.toLowerCase.split("\\W+"))
    .groupBy((_: String, word: String) => word)
    .count()(Materialized.as("counts-store"))
  wordCounts.toStream.to("WordsWithCountsTopic")

  // 运行它
  val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
  streams.start()
  sys.ShutdownHookThread {
    streams.close(Duration.ofSeconds(10))
  }
}

打包文件

使用sbt打包文件

先清空原来的包,之后打一个fat-jar

sbt clean assemble

测试程序是否正确

创建主题

使用命令:

 /opt/kafka_2.13-2.8.0/bin/kafka-topics.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --create --topic TextLinesTopic

运行jar包

jar包上传到服务器,运行命令:

java -jar kafka_stream_example-v1.0.jar

如果没有报错,会弹出以下的控制台输出:

[wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamTask - stream-thread [wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] task [0_0] Suspended running
[wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): wordcount-application-counts-store-changelog-0
[wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.RecordCollectorImpl - stream-thread [wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] task [0_0] Closing record collector clean
[wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamTask - stream-thread [wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] task [0_0] Closed clean
[wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamTask - stream-thread [wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] task [1_0] Suspended running
[wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=wordcount-application-a2cea16e-297c-44eb-9f2d-3d77098a4846-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions

监听消费者

运行命令:

bin/kafka-console-consumer.sh --bootstrap-server node1:9092,node2:9092,node3:9092 \
--topic WordsWithCountsTopic \
--from-beginning 
--formatter kafka.tools.DefaultMessageFormatter \
--property print.key=true \
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

如果运行正确,会出现以下控制台输出

world	1
scala	1
golang	1
hello	4
java	1
hello	5
scala	2

TextLinesTopic发送消息

运行命令:

bin/kafka-console-producer.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --topic TextLinesTopic

如果运行成功,可以在控制台输入多个任意字符,用空格分割

hello world hello scala hello java hello golang

这样在消费者端会看到类似以下的输出:

world	7
scala	8
java	4
hello	25
golang	4