apiVersion: v1kind: Servicemetadata: name: k8s-hadoop-masterspec: type: NodePort selector: app: k8s-hadoop-master ports: - name: rpc port: 9000 targetPort: 9000 - name: http port: 50070 targetPort: 50070 nodePort: 32007
其中,NameNode节点暴露2个服务端口:
-
9000端口用于内部IPC通信,主要用于获取文件的元数据
-
50070端口用于HTTP服务,为Hadoop 的Web管理使用
#!/usr/bin/env bashsed -i "s/@HDFS_MASTER_SERVICE@/$HDFS_MASTER_SERVICE/g" $HADOOP_HOME/etc/hadoop/core-site.xmlsed -i "s/@HDOOP_YARN_MASTER@/$HDOOP_YARN_MASTER/g" $HADOOP_HOME/etc/hadoop/yarn-site.xmlyarn-masterHADOOP_NODE="${HADOOP_NODE_TYPE}"if [ $HADOOP_NODE = "datanode" ]; then echo "Start DataNode ..." hdfs datanode -regularelse if [ $HADOOP_NODE = "namenode" ]; then echo "Start NameNode ..." hdfs namenode else if [ $HADOOP_NODE = "resourceman" ]; then echo "Start Yarn Resource Manager ..." yarn resourcemanager else if [ $HADOOP_NODE = "yarnnode" ]; then echo "Start Yarn Resource Node ..." yarn nodemanager else echo "not recoginized nodetype " fi fi fi fi
我们注意到,启动命令里把Hadoop配置文件(core-site.xml与yarn-site.xml)中的HDFS Master节点地址用环境变量中的参数HDFS_MASTER_SERVICE来替换,YARN Master节点地址则用HDOOP_YARN_MASTER来替换。下图是Hadoop HDFS 2节点集群的完整建模示意图:
dfs.namenode.datanode.registration.ip-hostname-check=false
apiVersion: v1kind: Podmetadata: name: k8s-hadoop-master labels: app: k8s-hadoop-masterspec: containers: - name: k8s-hadoop-master image: kubeguide/hadoop imagePullPolicy: IfNotPresent ports: - containerPort: 9000 - containerPort: 50070 env: - name: HADOOP_NODE_TYPE value: namenode - name: HDFS_MASTER_SERVICE valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDFS_MASTER_SERVICE - name: HDOOP_YARN_MASTER valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDOOP_YARN_MASTER restartPolicy: Always
下面是HDFS的Datanode的节点定义(hadoop-datanode-1):
apiVersion: v1kind: Podmetadata: name: hadoop-datanode-1 labels: app: hadoop-datanode-1spec: containers: - name: hadoop-datanode-1 image: kubeguide/hadoop imagePullPolicy: IfNotPresent ports: - containerPort: 9000 - containerPort: 50070 env: - name: HADOOP_NODE_TYPE value: datanode - name: HDFS_MASTER_SERVICE valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDFS_MASTER_SERVICE - name: HDOOP_YARN_MASTER valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDOOP_YARN_MASTER restartPolicy: Always
实际上,Datanode可以用DaemonSet方式在每个Kubernerntes节点上部署一个,在这里为了清晰起见,就没有用这个方式 定义。接下来,我们来看看Yarn框架如何建模,下图是Yarn框架的集群架构图:
apiVersion: v1kind: Servicemetadata: name: yarn-node-1spec: clusterIP: None selector: app: yarn-node-1 ports: - port: 8040
注意到定义中“clusterIP:None”这句话,表明这是一个Headless Service,没有自己的Cluster IP地址,下面给出YAM文件定义:
apiVersion: v1kind: Podmetadata: name: yarn-node-1 labels: app: yarn-node-1spec: containers: - name: yarn-node-1 image: kubeguide/hadoop imagePullPolicy: IfNotPresent ports: - containerPort: 8040 - containerPort: 8041 - containerPort: 8042 env: - name: HADOOP_NODE_TYPE value: yarnnode - name: HDFS_MASTER_SERVICE valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDFS_MASTER_SERVICE - name: HDOOP_YARN_MASTER valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDOOP_YARN_MASTER restartPolicy: Always
ResourceManager的YAML定义没有什么特殊的地方,其中Service定义如下:
apiVersion: v1kind: Servicemetadata: name: ku8-yarn-masterspec: type: NodePort selector: app: yarn-master ports: - name: "8030" port: 8030 - name: "8031" port: 8031 - name: "8032" port: 8032 - name: http port: 8088 targetPort: 8088 nodePort: 32088
对应的Pod定义如下:
apiVersion: v1kind: Podmetadata: name: yarn-master labels: app: yarn-masterspec: containers: - name: yarn-master image: kubeguide/hadoop imagePullPolicy: IfNotPresent ports: - containerPort: 9000 - containerPort: 50070 env: - name: HADOOP_NODE_TYPE value: resourceman - name: HDFS_MASTER_SERVICE valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDFS_MASTER_SERVICE - name: HDOOP_YARN_MASTER valueFrom: configMapKeyRef: name: ku8-hadoop-conf key: HDOOP_YARN_MASTER restartPolicy: Always
目前这个方案,还遗留了一个问题有待解决:HDFS NameNode节点重启后的文件系统格式化问题,这个问题可以通过启动脚本来解决,即判断HDFS文件系统是否已经格式化过,如果没有,就启动时候执行格式化命令,否则跳过格式化命令。
安装完毕后,我们可以通过浏览器访问Hadoop的HDFS管理界面,点击主页上的Overview页签会显示我们熟悉的HDFS界面:
root@hadoop-master:/usr/local/hadoop/bin# hadoop fs -ls /root@hadoop-master:/usr/local/hadoop/bin# hadoop fs -mkdir /leader-usroot@hadoop-master:/usr/local/hadoop/bin# hadoop fs -ls /Found 1 itemsdrwxr-xr-x - root supergroup 0 2017-02-17 07:32 /leader-usroot@hadoop-master:/usr/local/hadoop/bin# hadoop fs -put hdfs.cmd /leader-us
然后,我们可以在HDFS管理界面中浏览HDFS文件系统,验证刚才的操作结果:
-
CPU:2*E5-2640v3-8Core
-
内存:16*16G DDR4
-
网卡:2*10GE多模光口
-
硬盘:12*3T SATA
-
BigCloud Enterprise Linux 7(GNU/Linux 3.10.0-514.el7.x86_64 x86_64)
-
Hadoop2.7.2
-
Kubernetes 1.7.4+ Calico V3.0.1
-
TestDFSIO:分布式系统读写测试
-
NNBench:NameNode测试
-
MRBench:MapReduce测试
-
WordCount:单词频率统计任务测试
-
TeraSort:TeraSort任务测试