0. Introduction
CMAK是kafka-manager当前的项目名称,提供了web控制kafka集群的各个操作,调研的目的是研究kafka-manager在其本地的zookeeper中的信息,以及在操作topic过程中的操作。github Repo地址 https://github.com/yahoo/CMAK
1. Zookeeper中存了什么
1.1 导出脚本
使用脚本遍历kafka-manager使用的zookeeper,脚本:
from kazoo.client import KazooClient
kafka_cluster = "x.x.x.x:2181"
fileName = "result"
def get_kafka_client():
zk = KazooClient(hosts=kafka_cluster)
zk.start()
return zk
def get_all_cluster_clusters():
zk_client = get_kafka_client()
path_list = ["/kafka-manager/clusters"]
result = []
while len(path_list) > 0:
next_layer = list()
for path in path_list:
children = zk_client.get_children(path)
# 这里是叶子节点
if len(children) == 0:
if path.split("/")[-1] == "topics":
print("cluster {0} has no topic.".format(path))
content = zk_client.get(path)[0]
result.append([path, content])
for child in children:
new_path = path + "/" + child
next_layer.append(new_path)
path_list = next_layer
zk_client.stop()
zk_client.close()
cluster_topic_map = dict()
for item in result:
split_path = item[0].split("/")
if len(split_path) < 4:
continue
cluster_name = str(split_path[3])
if cluster_name not in cluster_topic_map:
cluster_topic_map[cluster_name] = 0
if len(item[0].split("/")) != 6 or item[0].split("/")[-2] != "topics":
continue
cluster_topic_map[cluster_name] += 1
for key in cluster_topic_map:
print("{0} has {1} topics".format(key, cluster_topic_map[key]))
with open("kafka_clusters", "w") as f:
f.seek(0)
for item in result:
if len(item[0].split("/")) != 6 or item[0].split("/")[-2] != "topics":
continue
f.write(item[0] + "\n")
f.write(item[1] + "\n")
将/kafka-managernamespace下的所有znode的path和内容全部导出后有以下内容。
1.2 内容分析
/kafka-manager路径下有/kafka-manager/configs /kafka-manager/mutex /kafka-manager/deleteClusters /kafka-manager/clusters。其中,configs下存储了kafka-manager集群注册的各个kafka集群的元信息,clusters中记录了部分kafka集群的topic信息。
其中举例说明/kafka-manager/configs/:cluster下的元信息
{
"name":"data-ckc1-qcsh4",
"curatorConfig":{
"zkConnect":"x.x.x.x:2181,x.x.x.x:2181,x.x.x.x:2181:2181",
"zkMaxRetry":100,
"baseSleepTimeMs":100,
"maxSleepTimeMs":1000
},
"enabled":true,
"kafkaVersion":"1.1.0",
"jmxEnabled":true,
"jmxUser":null,
"jmxPass":null,
"jmxSsl":false,
"pollConsumers":false,
"filterConsumers":false,
"logkafkaEnabled":false,
"activeOffsetCacheEnabled":false,
"displaySizeEnabled":false,
"tuning":{
"brokerViewUpdatePeriodSeconds":30,
"clusterManagerThreadPoolSize":2,
"clusterManagerThreadPoolQueueSize":100,
"kafkaCommandThreadPoolSize":2,
"kafkaCommandThreadPoolQueueSize":100,
"logkafkaCommandThreadPoolSize":2,
"logkafkaCommandThreadPoolQueueSize":100,
"logkafkaUpdatePeriodSeconds":30,
"partitionOffsetCacheTimeoutSecs":5,
"brokerViewThreadPoolSize":2,
"brokerViewThreadPoolQueueSize":1000,
"offsetCacheThreadPoolSize":2,
"offsetCacheThreadPoolQueueSize":1000,
"kafkaAdminClientThreadPoolSize":2,
"kafkaAdminClientThreadPoolQueueSize":1000,
"kafkaManagedOffsetMetadataCheckMillis":30000,
"kafkaManagedOffsetGroupCacheSize":1000000,
"kafkaManagedOffsetGroupExpireDays":7
},
"securityProtocol":"PLAINTEXT"
}
可见该路径下的znodes托管了全量的元信息,kafka-manager这里使用这里托管的元信息来实现对托管kafka集群的访问。
/kafka-manager/clusters下的路径为/kafka-manager/clusters/:cluster/topics/:topic,存储的是topic信息,但是与预期不同的是,kafka-manager使用的是部分的topic信息,对于存在的topic信息,有:
{
"69":[6,1,2],
"0":[7,5,6],
"5":[2,10,1]
}
其中key对应的是Parttion,Value对应的是Replicas。
初步结论是这里的部分topics信息被用来作为缓存,具体的用途还需要进一步的调研。
1.3 Zookeeper 信息整理总结
kafka-manager中管理的kafka集群信息中,只存储了集群的基本元信息,没有存储全量的比如Partition/Topics等详细信息。
2 CMAK 源代码走读
CMAK的源代码基于Scala编写,使用的框架和第三方库有:
- Play framework: Kafka-Mananger本质上是个Web应用, 因此使用play framework的MVC架构实现;
- AKKA: 用于构建高并发、分布式和容错的应用. Kafka Manager中的所有请求都使用akka来异步处理;
- Apache Curator Framework: 用于访问zookeeper;
- Kafka Sdk: 用于获取各Topic的last offset, 使用Admin接口实现各种管理功能;
2.0 Intro
整体的业务代码位于app路径下,对于cmak的使用的框架而言,程序的入口是CMAK/conf/routes这里因为时间原因,只介绍创建topic和查找topic的操作流程。
2.1 GET topic
从路由映射表中可知
GET /clusters/:c/topics controllers.Topic.topics(c:String)
app/controllers/Topic.Scala
def topics(c: String) = Action.async { implicit request:RequestHeader =>
kafkaManager.getTopicListExtended(c).map { errorOrTopicList =>
Ok(views.html.topic.topicList(c,errorOrTopicList)).withHeaders("X-Frame-Options" -> "SAMEORIGIN")
}
}
其中业务函数是getTopicListExtended,返回的是一个html模板views.html.topic.topicList,模板地址为app/views/topic/topicList.scala.html,其中业务函数为
app/kafka/manager/KafkaManager.scala
def getTopicListExtended(clusterName: String): Future[ApiError \/ TopicListExtended] = {
val futureTopicIdentities = tryWithKafkaManagerActor(KMClusterQueryRequest(clusterName, BVGetTopicIdentities))(
identity[Map[String, TopicIdentity]])
val futureTopicList = tryWithKafkaManagerActor(KMClusterQueryRequest(clusterName, KSGetTopics))(identity[TopicList])
val futureTopicToConsumerMap = tryWithKafkaManagerActor(KMClusterQueryRequest(clusterName, BVGetTopicConsumerMap))(
identity[Map[String, Iterable[(String, ConsumerType)]]])
val futureTopicsReasgn = getTopicsUnderReassignment(clusterName)
implicit val ec = apiExecutionContext
for {
errOrTi <- futureTopicIdentities
errOrTl <- futureTopicList
errOrTCm <- futureTopicToConsumerMap
errOrRap <- futureTopicsReasgn
} yield {
for {
ti <- errOrTi
tl <- errOrTl
tcm <- errOrTCm
rap <- errOrRap
} yield {
TopicListExtended(tl.list.map(t => (t, ti.get(t))).sortBy(_._1), tcm, tl.deleteSet, rap, tl.clusterContext)
}
}
}
从代码中不难看出,这里通过KafkaManagerActor获取了集群的topic信息,这里包的实际的操作是KMClusterQueryRequest,具体的调用为app/kafka/manager/actor/KafkaManagerActor.scala
case KMClusterQueryRequest(clusterName, request) =>
clusterManagerMap.get(clusterName).fold[Unit] {
sender ! ActorErrorResponse(s"Unknown cluster : $clusterName")
} {
clusterManagerPath:ActorPath =>
context.actorSelection(clusterManagerPath).forward(request)
}
和app/kafka/manager/actor/KafkaStateActor.scala
case KSGetTopics =>
val deleteSet: Set[String] =
featureGateFold(KMDeleteTopicFeature)(
Set.empty,
{
val deleteTopicsData: mutable.Buffer[ChildData] = deleteTopicsPathCache.getCurrentData.asScala
deleteTopicsData.map { cd =>
nodeFromPath(cd.getPath)
}.toSet
})
withTopicsTreeCache { cache =>
cache.getCurrentChildren(ZkUtils.BrokerTopicsPath)
}.fold {
sender ! TopicList(IndexedSeq.empty, deleteSet, config.clusterContext)
} { data: java.util.Map[String, ChildData] =>
sender ! TopicList(data.asScala.keySet.toIndexedSeq, deleteSet, config.clusterContext)
}
val BrokerTopicsPath = "/brokers/topics"
从代码中可知,获取topic的方式似乎是cache+从已知的信息中获取集群元信息并访问对应的集群,并从目标集群的/brokers/topics中获取topic信息。
2.2 POST CREATE topic
查uri表可知,有
GET /clusters/:c/createTopic controllers.Topic.createTopic(c:String)
POST /clusters/:c/topics/create controllers.Topic.handleCreateTopic(c:String)
函数所在路径app/controllers/Topic.scala
def handleCreateTopic(clusterName: String) = Action.async { implicit request:Request[AnyContent] =>
featureGate(KMTopicManagerFeature) {
defaultCreateForm.bindFromRequest.fold(
formWithErrors => {
kafkaManager.getClusterContext(clusterName).map { clusterContext =>
BadRequest(views.html.topic.createTopic(clusterName, clusterContext.map(c => (formWithErrors, c))))
}.recover {
case t =>
implicit val clusterFeatures = ClusterFeatures.default
Ok(views.html.common.resultOfCommand(
views.html.navigation.clusterMenu(clusterName, "Topic", "Create", menus.clusterMenus(clusterName)),
models.navigation.BreadCrumbs.withNamedViewAndCluster("Topics", clusterName, "Create Topic"),
-\/(ApiError(s"Unknown error : ${t.getMessage}")),
"Create Topic",
FollowLink("Try again.", routes.Topic.createTopic(clusterName).toString()),
FollowLink("Try again.", routes.Topic.createTopic(clusterName).toString())
)).withHeaders("X-Frame-Options" -> "SAMEORIGIN")
}
},
ct => {
val props = new Properties()
ct.configs.filter(_.value.isDefined).foreach(c => props.setProperty(c.name, c.value.get))
kafkaManager.createTopic(clusterName, ct.topic, ct.partitions, ct.replication, props).map { errorOrSuccess =>
implicit val clusterFeatures = errorOrSuccess.toOption.map(_.clusterFeatures).getOrElse(ClusterFeatures.default)
Ok(views.html.common.resultOfCommand(
views.html.navigation.clusterMenu(clusterName, "Topic", "Create", menus.clusterMenus(clusterName)),
models.navigation.BreadCrumbs.withNamedViewAndCluster("Topics", clusterName, "Create Topic"),
errorOrSuccess,
"Create Topic",
FollowLink("Go to topic view.", routes.Topic.topic(clusterName, ct.topic).toString()),
FollowLink("Try again.", routes.Topic.createTopic(clusterName).toString())
)).withHeaders("X-Frame-Options" -> "SAMEORIGIN")
}
}
)
}
}
其中落实的地点为app/kafka/manager/KafkaManager.scala
def createTopic(
clusterName: String,
topic: String,
partitions: Int,
replication: Int,
config: Properties = new Properties
): Future[ApiError \/ ClusterContext] =
{
implicit val ec = apiExecutionContext
withKafkaManagerActor(KMClusterCommandRequest(clusterName, CMCreateTopic(topic, partitions, replication, config))) {
result: Future[CMCommandResult] =>
result.map(cmr => toDisjunction(cmr.result))
}
}
具体为CMCreateTopic:app/kafka/manager/model/ActorModel.scala
按照这个思路一直往下查有:
case class CMCreateTopic(topic: String,
partitions: Int,
replicationFactor: Int,
config: Properties = new Properties) extends CommandRequest
app/kafka/manager/actor/cluster/ClusterMnagerActor.scala
case CMCreateTopic(topic, partitions, replication, config) =>
implicit val ec = longRunningExecutionContext
val eventualTopicDescription = withKafkaStateActor(KSGetTopicDescription(topic))(identity[Option[TopicDescription]])
val eventualBrokerList = withKafkaStateActor(KSGetBrokers)(identity[BrokerList])
eventualTopicDescription.map { topicDescriptionOption =>
topicDescriptionOption.fold {
eventualBrokerList.flatMap {
bl => withKafkaCommandActor(KCCreateTopic(topic, bl.list.map(_.id).toSet, partitions, replication, config)) {
kcResponse: KCCommandResult =>
CMCommandResult(kcResponse.result)
}
}
} { td =>
Future.successful(CMCommandResult(Failure(new IllegalArgumentException(s"Topic already exists : $topic"))))
}
} pipeTo sender()
case class KCCreateTopic(topic: String,
brokers: Set[Int],
partitions: Int,
replicationFactor:Int,
config: Properties) extends CommandRequest
case KCCreateTopic(topic, brokers, partitions, replicationFactor, config) =>
longRunning {
Future {
KCCommandResult(Try {
kafkaCommandActorConfig.adminUtils.createTopic(kafkaCommandActorConfig.curator, brokers, topic, partitions, replicationFactor, config)
})
}
}
def createTopic(curator: CuratorFramework,
brokers: Set[Int],
topic: String,
partitions: Int,
replicationFactor: Int,
topicConfig: Properties = new Properties): Unit = {
val replicaAssignment = assignReplicasToBrokers(brokers,partitions,replicationFactor)
createOrUpdateTopicPartitionAssignmentPathInZK(curator, topic, replicaAssignment, topicConfig)
}
可以看到创建topic到流程一共有两个步骤,第一个是计算Replica和Broker的分组,第二个是将新增或者是修改的内容注册到目标kafka集群的zookeeper上
其中assignReplicasToBrokers的计算解释如下
/**
* There are 2 goals of replica assignment:
* 1. Spread the replicas evenly among brokers.
* 2. For partitions assigned to a particular broker, their other replicas are spread over the other brokers.
*
* To achieve this goal, we:
* 1. Assign the first replica of each partition by round-robin, starting from a random position in the broker list.
* 2. Assign the remaining replicas of each partition with an increasing shift.
*
* Here is an example of assigning
* broker-0 broker-1 broker-2 broker-3 broker-4
* p0 p1 p2 p3 p4 (1st replica)
* p5 p6 p7 p8 p9 (1st replica)
* p4 p0 p1 p2 p3 (2nd replica)
* p8 p9 p5 p6 p7 (2nd replica)
* p3 p4 p0 p1 p2 (3nd replica)
* p7 p8 p9 p5 p6 (3nd replica)
*/
def createOrUpdateTopicPartitionAssignmentPathInZK(curator: CuratorFramework,
topic: String,
partitionReplicaAssignment: Map[Int, Seq[Int]],
config: Properties = new Properties,
update: Boolean = false,
readVersion: Int = -1) {
// validate arguments
Topic.validate(topic)
TopicConfigs.validate(version,config)
checkCondition(partitionReplicaAssignment.values.map(_.size).toSet.size == 1, TopicErrors.InconsistentPartitionReplicas)
val topicPath = ZkUtils.getTopicPath(topic)
if(!update ) {
checkCondition(curator.checkExists().forPath(topicPath) == null,TopicErrors.TopicAlreadyExists(topic))
}
partitionReplicaAssignment.foreach {
case (part,reps) => checkCondition(reps.size == reps.toSet.size, TopicErrors.DuplicateReplicaAssignment(topic,part,reps))
}
// write out the config on create, not update, if there is any, this isn't transactional with the partition assignments
if(!update) {
writeTopicConfig(curator, topic, config)
}
// create the partition assignment
writeTopicPartitionAssignment(curator, topic, partitionReplicaAssignment, update, readVersion)
}
其中建立的操作是去指定的zookeeper集群上完成的,如果是第一次新建,还要在/config/topics/:topic的路径下做初次登记。
3 Conclusion
综合上面两节的调研,我们可以得出结论是Kafka-manager/CMAK服务在自己服务的Zookeeper上存储的真正有价值的信息是/configs/:cluster中的集群元信息,CMAK服务通过这些元信息链接kafka集群,进而完成操作。所以如果是希望接过来cmak的集群显示和集群topic的增删改查工作的话,只需要扫过来/kafka-mananger/configs/:cluster这些znodes里面的元信息即可
4 TODO 未来的工作
- 初步理清了调用链路,但是对具体的实现还不清楚细节;
- 既然是观察到cmak的zk中topics路径下的topics的用处,那么这些数据是做什么用的呢?
- cmak重启之后第一次查询kafka集群会非常慢,但是再次查询就会非常快,显然cmak有自己的缓冲机制,那么这些缓冲机制是怎么实现的呢?