本文已参与「新人创作礼」活动,一起开启掘金创作之路。
此文以Hadoop 3.2.2版本为例!
如未指定,下述命令在所有节点执行!
一、 系统资源及组件规划
| 节点名称 | 系统名称 | CPU/ 内存 | 网卡 | 磁盘 | IP 地址 | OS | 节点角色 |
|---|---|---|---|---|---|---|---|
| Master1 | master1 | 2C/4G | ens33 | 128G | 192.168.0.11 | CentOS7 | NameNode、ResourceManager、DFSZKFailoverController |
| Master2 | master2 | 2C/4G | ens33 | 128G | 192.168.0.12 | CentOS7 | NameNode、ResourceManager、DFSZKFailoverController |
| Worker1 | worker1 | 2C/4G | ens33 | 128G | 192.168.0.21 | CentOS7 | DataNode、NodeManager、JournalNode、QuorumPeerMain |
| Worker2 | worker2 | 2C/4G | ens33 | 128G | 192.168.0.22 | CentOS7 | DataNode、NodeManager、JournalNode、QuorumPeerMain |
| Worker3 | worker3 | 2C/4G | ens33 | 128G | 192.168.0.23 | CentOS7 | DataNode、NodeManager、JournalNode、QuorumPeerMain |
二、系统软件安装与设置
1、安装基本软件
yum -y install vim lrzsz bash-completion
2、设置名称解析
echo 192.168.0.11 master1 >> /etc/hosts
echo 192.168.0.12 master2 >> /etc/hosts
echo 192.168.0.21 worker1 >> /etc/hosts
echo 192.168.0.22 worker2 >> /etc/hosts
echo 192.168.0.23 worker3 >> /etc/hosts
3、设置NTP
yum -y install chrony
systemctl start chronyd
systemctl enable chronyd
systemctl status chronyd
chronyc sources
4、设置SELinux、防火墙
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
三、搭建Hadoop完全分布式高可用集群
1、设置SSH免密登录
在Master1和Master2节点上(NameNode和ResourceManager节点)配置免密ssh所有节点:
ssh-keygen -t rsa
for host in master1 master2 worker1 worker2 worker3; do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; done
2、安装JDK
下载JDK文件:
参考地址:www.oracle.com/java/techno…
解压JDK安装文件:
tar -xf /root/jdk-8u291-linux-x64.tar.gz -C /usr/local/
设置环境变量:
export JAVA_HOME=/usr/local/jdk1.8.0_291/
export PATH=$PATH:/usr/local/jdk1.8.0_291/bin/
添加环境变量至/etc/profile文件:
export JAVA_HOME=/usr/local/jdk1.8.0_291/
PATH=$PATH:/usr/local/jdk1.8.0_291/bin/
查看Java版本:
java -version
3、安装ZooKeeper
下载ZooKeeper文件:
参考地址:downloads.apache.org/zookeeper/s…
在Worker节点上(ZooKeeper节点)解压ZooKeeper安装文件:
tar -xf /root/apache-zookeeper-3.6.3-bin.tar.gz -C /usr/local/
在Worker节点上(ZooKeeper节点)设置环境变量:
export PATH=$PATH:/usr/local/apache-zookeeper-3.6.3-bin/bin/
在Worker节点上(ZooKeeper节点)添加环境变量至/etc/profile文件:
PATH=$PATH:/usr/local/apache-zookeeper-3.6.3-bin/bin/
在Worker节点上(ZooKeeper节点)创建ZooKeeper数据目录:
mkdir /usr/local/apache-zookeeper-3.6.3-bin/data/
在Worker节点上(ZooKeeper节点)创建ZooKeeper配置文件:
mv /usr/local/apache-zookeeper-3.6.3-bin/conf/zoo_sample.cfg /usr/local/apache-zookeeper-3.6.3-bin/conf/zoo.cfg
在Worker节点上(ZooKeeper节点)修改ZooKeeper配置文件:
vim /usr/local/apache-zookeeper-3.6.3-bin/conf/zoo.cfg
添加数据目录:
dataDir=/usr/local/apache-zookeeper-3.6.3-bin/data/
添加ZooKeeper节点:
server.1=worker1:2888:3888
server.2=worker2:2888:3888
server.3=worker3:2888:3888
ZooKeeper配置参数解读
Server.A=B:C:D。
A是一个数字,表示这个是第几号服务器;
B是这个服务器的IP地址;
C是这个服务器与集群中的Leader服务器交换信息的端口;
D是当集群中的Leader服务器故障,需要一个端口来重新进行选举,选出一个新的Leader,而端口D就是用来执行选举时服务器相互通信的端口。
集群模式下配置一个文件myid,这个文件在dataDir目录下,这个文件里面有一个数据就是A的值,Zookeeper启动时读取此文件,拿到里面的数据与zoo.cfg里面的配置信息比较从而判断到底是哪个server。
在Worker1节点上(ZooKeeper节点)创建myid文件,并添加A值:
touch /usr/local/apache-zookeeper-3.6.3-bin/data/myid
echo 1 > /usr/local/apache-zookeeper-3.6.3-bin/data/myid
在Worker2节点上(ZooKeeper节点)创建myid文件,并添加A值:
touch /usr/local/apache-zookeeper-3.6.3-bin/data/myid
echo 2 > /usr/local/apache-zookeeper-3.6.3-bin/data/myid
在Worker3节点上(ZooKeeper节点)创建myid文件,并添加A值:
touch /usr/local/apache-zookeeper-3.6.3-bin/data/myid
echo 3 > /usr/local/apache-zookeeper-3.6.3-bin/data/myid
在Worker节点上(ZooKeeper节点)启动ZooKeeper:
zkServer.sh start
在Worker节点上(ZooKeeper节点)查看ZooKeeper状态:
zkServer.sh status
4、安装Hadoop集群
下载Hadoop文件:
参考地址:hadoop.apache.org/releases.ht…
解压Hadoop安装文件:
tar -xf /root/hadoop-3.2.2.tar.gz -C /usr/local/
设置环境变量:
export PATH=$PATH:/usr/local/hadoop-3.2.2/bin/:/usr/local/hadoop-3.2.2/sbin/
添加环境变量至/etc/profile文件:
PATH=$PATH:/usr/local/hadoop-3.2.2/bin/:/usr/local/hadoop-3.2.2/sbin/
Master节点/etc/profile文件:
Worker节点/etc/profile文件:
查看Hadoop版本:
hadoop version
5、配置Hadoop集群
修改core-site.xml文件:
cat > /usr/local/hadoop-3.2.2/etc/hadoop/core-site.xml << EOF
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
</configuration>
EOF
修改hdfs-site.xml文件:
cat > /usr/local/hadoop-3.2.2/etc/hadoop/hdfs-site.xml << EOF
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>master2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>master2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://worker1:8485;worker2:8485;worker3:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
EOF
修改mapred-site.xml文件:
cat > /usr/local/hadoop-3.2.2/etc/hadoop/mapred-site.xml << EOF
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
EOF
在Master1节点上修改yarn-site.xml文件,yarn.resourcemanager.ha.id为rm1:
cat > /usr/local/hadoop-3.2.2/etc/hadoop/yarn-site.xml << EOF
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster-yarn</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
EOF
在Master2节点上修改yarn-site.xml文件,yarn.resourcemanager.ha.id为rm2:
yarn.resourcemanager.ha.id为rm2
cat > /usr/local/hadoop-3.2.2/etc/hadoop/yarn-site.xml << EOF
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster-yarn</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
EOF
在Worker节点上修改yarn-site.xml文件,无yarn.resourcemanager.ha.id:
cat > /usr/local/hadoop-3.2.2/etc/hadoop/yarn-site.xml << EOF
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster-yarn</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>worker1:2181,worker2:2181,worker3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
EOF
修改hadoop-env.sh文件:
vim /usr/local/hadoop-3.2.2/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_291/
修改workers文件,指定Worker节点:
echo worker1 > /usr/local/hadoop-3.2.2/etc/hadoop/workers
echo worker2 >> /usr/local/hadoop-3.2.2/etc/hadoop/workers
echo worker3 >> /usr/local/hadoop-3.2.2/etc/hadoop/workers
修改start-dfs.sh文件,指定启动用户:
vim /usr/local/hadoop-3.2.2/sbin/start-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
修改stop-dfs.sh文件,指定启动用户:
vim /usr/local/hadoop-3.2.2/sbin/stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
修改start-yarn.sh文件,指定启动用户:
vim /usr/local/hadoop-3.2.2/sbin/start-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
修改stop-yarn.sh文件,指定启动用户:
vim /usr/local/hadoop-3.2.2/sbin/stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
6、启动Hadoop集群
在Worker节点上(JournalNode节点)启动JournalNode:
hdfs --daemon start journalnode
在Master1节点上(NameNode节点)格式化NameNode:
hdfs namenode -format
在所有Master节点上(DFSZKFailoverController节点)格式化zkfc:
hdfs zkfc -formatZK
在Master1节点上(主NameNode节点)启动HDFS:
start-dfs.sh
在Master1节点上(主ResourceManager节点)启动YARN:
start-yarn.sh
在Master2节点上(从NameNode节点)格式化NameNode Standby:
hdfs namenode -bootstrapStandby
在Master2节点上(从NameNode节点)启动NameNode Standby:
hdfs --daemon start namenode
在各类节点上查看Hadoop进程:
jps
7、Hadoop集群故障演示
在任意Master节点上查看NameNode状态:
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
在任意Master节点上查看ResourceManager状态:
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
Master1节点故障,主从节点已切换
关闭Master1节点
在Master2节点上查看NameNode状态:
hdfs haadmin -getServiceState nn2
在Master2节点上查看ResourceManager状态:
yarn rmadmin -getServiceState rm2
恢复Master1节点,启动HDFS与YARN
在任意Master节点上查看NameNode状态:
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
在任意Master节点上查看ResourceManager状态:
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
Master1节点恢复,主从节点未切换
8、关闭Hadoop集群
在Master1节点上(从NameNode节点)关闭NameNode Standby:
hdfs --daemon stop namenode
在Master2节点上(主ResourceManager节点)关闭YARN:
stop-yarn.sh
在Master2节点上(主NameNode节点)关闭HDFS:
stop-dfs.sh
在Worker节点上(JournalNode节点)关闭JournalNode:
hdfs --daemon stop journalnode
在Worker节点上(ZooKeeper节点)关闭ZooKeeper:
zkServer.sh stop