一、版本
组件名 | 版本 | 备注 |
---|---|---|
CentOS | 7.6.1810 64-bit | cat /etc/redhat-release 命令查看操作系统版本 file /bin/ls 命令查看操作系统位数 |
JRE | java version "1.8.0_144 | 需要java 1.8 或者 java 11 |
Hadoop | hadoop-2.6.0-cdh5.16.2.tar.gz | hdfs+mapreduce+yarn |
Zookeeper | zookeeper-3.4.5-cdh5.16.2.tar.gz | HDFS、Yarn 使用的协调服务 |
二、主机规划
IP | Host | 安装软件 | 进程 |
---|---|---|---|
192.168.188.XXX | bigdata01 | Hadoop ZooKeeper | NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain |
192.168.188.XXX | bigdata02 | Hadoop ZooKeeper | NameNode DFSZKFailoverController JournalNode DataNode ResourceManager NodeManager QuorumPeerMain |
192.168.188.XXX | bigdata03 | Hadoop ZooKeeper | JournalNode DataNode NodeManager QuorumPeerMain |
三、目录规划
名称 | 路径 | 备注 |
---|---|---|
$HADOOP_HOME | /home/XXX/app/hadoop/ | |
Data | /home/XXX/data/ | |
Log | /home/XXX/log/ | |
hadoop.tmp.dir | /home/XXX/tmp/hadoop/ | 需要创建,权限777 |
$ZOOKEEPER_HOME | /home/XXX/app/zookeeper/ |
四、Hadoop HA集群部署
准备工作
配置信任关系
- 在各个节点生成配置信任关系的key:
ssh-kengen
- 以bigdata01为主,配置信任关系。在bigdata02和bigdata03中把~/.ssh下的id_rsa.pub文件传给bigdata01并重命名:
-
scp ~/.ssh/id_rsa.pub root@bigdata01:~/.ssh/id_rsa.pub2 scp ~/.ssh/id_rsa.pub root@bigdata01:~/.ssh/id_rsa.pub3
-
- 在bigdata01下,将三个id_rsa.pub文件合并为authorized_keys,并赋予authorized_keys 600的权限:
-
cat id_rsa.pub >> authorized_keys cat id_rsa.pub2 >> authorized_keys cat id_rsa.pub3 >> authorized_keys chmod 600 authorized_keys
-
- 将authorized_keys文件传给bigdata02和bigdata03(root下执行):
-
scp /home/XXX/.ssh/authorized_keys bigdata02:/home/XXX/.ssh/ scp /home/XXX/.ssh/authorized_keys bigdata03:/home/XXX/.ssh/
-
- 分别在bigdata02和bigdata03的root用户下将authorized_keys 赋给XXX用户:
-
chown XXX:XXX /home/XXX/.ssh/authorized_keys
-
- 初始化操作,在known_hosts文件中有记录,用于检验信任关系:
-
ssh bigdata01 date ssh bigdata02 date ssh bigdata03 date
-
- 注意:
- 如果输入yes之后,还出现要输入密码,检查配置操作问题
- 当信任关系中的机器发生变化时,需要重新配置信任关系。known_hosts和authorized_keys中相应的机器信息要全部删除。
1、安装jdk
- 各个节点都要执行
1.1 创建文件夹并解压jdk:
mkdir /usr/java
tar -xzvf jdk-XXXX.gz -C /usr/java/
chown -R root:root /usr/java/jdk-XXXX
1.2 配置JAVA环境变量
- 注意要将其配置在
%PATH
的前面,否则可能会出现冲突。sudo vi /etc/profile
-
export JAVA_HOME=/usr/java/jdk-XXXX export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile
- 验证:
which java
2、zookeeper部署
2.1 解压zookeeper
tar -xzvf zookeeper-3.4.5-cdh5.16.2.tar.gz -C ~/app/
2.2 创建软连接
- 方便后续版本迭代
ln -s zookeeper-3.4.5-cdh5.16.2 zookeeper
2.3 配置环境变量
- 环境变量:
vi ~/.bashrc
-
export ZOOKEEPER_HOME=/home/XXX/app/zookeeper export PATH=${ZOOKEEPER_HOME}/bin:$PATH
source /etc/profile
-
2.4 修改配置文件
vi zoo.cfg
- 修改dataDir路径:
dataDir=/home/XXX/tmp/zookeeper
- 增加三台机器对zookeeper的协议端口号:
-
server.1=hadoop001:2888:3888 server.2=hadoop002:2888:3888 server.3=hadoop003:2888:3888
-
- 修改dataDir路径:
- 创建zookeeper集群各个节点的唯一标识:
touch /home/hadoop/tmp/zookeeper/myid
- 各个节点分别执行:
-
echo 1 > /home/XXX/tmp/zookeeper/myid echo 2 > /home/XXX/tmp/zookeeper/myid echo 3 > /home/XXX/tmp/zookeeper/myid
-
2.5 zookeeper操作:
- 启动zookeeper服务
- 各个节点服务启动前,处于not running状态
sh ~/app/zookeeper/bin/zkServer.sh start
- 查看服务状态
sh ~/app/zookeeper/bin/zkServer.sh status
- 关闭服务
sh ~/app/zookeeper/bin/zkServer.sh stop
3、hadoop部署
3.1 解压hadoop
tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C /home/hadoop/app/
3.2 配置软连接
ln -s hadoop-2.6.0-cdh5.16.2 hadoop
3.3 配置环境变量
- 环境变量:
vi ~/.bashrc
-
export HADOOP_HOME=/home/XXX/app/hadoop export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
source /etc/profile
-
3.3 自定义配置文件
- 这里给出参考,根据生产实际情况自行修改
vi hadoop-env.sh
-
# hadoop 2.x 这里必须修改,3.x修复 export JAVA_HOME=本地JDK路径 export HADOOP_PID_DIR=/home/XXX/tmp
-
- hdfs-site.xml:
-
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- HDFS超级用户 --> <property> <name>dfs.permissions.superusergroup</name> <value>data</value> </property> <!-- 开启web hdfs --> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/data/data/dfs/name</value> <description> namenode 存放name table(fsimage)本地目录(需要修改)</description> </property> <property> <name>dfs.namenode.edits.dir</name> <value>${dfs.namenode.name.dir}</value> <description>namenode粗放 transaction file(edits)本地目录(需要修改)</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/data/data/dfs/data</value> <description>datanode存放block本地目录(需要修改)</description> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 块大小128M (默认128M) --> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!--======================================================================= --> <!-- HDFS高可用配置 --> <!-- 指定hdfs的nameservice为bigdata,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>bigdata</value> </property> <property> <!-- 设置NameNode IDs 此版本最大只支持两个NameNode --> <name>dfs.ha.namenodes.bigdata</name> <value>nn1,nn2</value> </property> <!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 --> <property> <name>dfs.namenode.rpc-address.bigdata.nn1</name> <value>bigdata01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.bigdata.nn2</name> <value>bigdata02:8020</value> </property> <!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 --> <property> <name>dfs.namenode.http-address.bigdata.nn1</name> <value>bigdata01:50070</value> </property> <property> <name>dfs.namenode.http-address.bigdata.nn2</name> <value>bigdata02:50070</value> </property> <!-- ==================Namenode editlog同步 ============================================ --> <!-- 保证数据恢复 --> <property> <name>dfs.journalnode.http-address</name> <value>0.0.0.0:8480</value> </property> <property> <name>dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> </property> <property> <!-- 设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog --> <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address --> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://bigdata01:8485;bigdata02:8485;bigdata03:8485/bigdata</value> </property> <property> <!--JournalNode存放数据地址 --> <name>dfs.journalnode.edits.dir</name> <value>/home/data/data/dfs/jn</value> </property> <!--==================DataNode editlog同步 ============================================ --> <property> <!--DataNode,Client连接Namenode识别选择Active NameNode策略 --> <!-- 配置失败自动切换实现方式 --> <name>dfs.client.failover.proxy.provider.bigdata</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!--==================Namenode fencing:=============================================== --> <!--Failover后防止停掉的Namenode启动,造成两个服务 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/data/.ssh/id_rsa</value> </property> <property> <!--多少milliseconds 认为fencing失败 --> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!--==================NameNode auto failover base ZKFC and Zookeeper====================== --> <!-- 开启基于Zookeeper --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!--动态许可datanode连接namenode列表 --> <property> <name>dfs.hosts</name> <value>/home/data/app/hadoop/etc/hadoop/slaves</value> </property> <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> </property> </configuration>
- core-site.xml:
-
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--Yarn 需要使用 fs.defaultFS 指定NameNode URI --> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata</value> </property> <!--==============================Trash机制======================================= --> <property> <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 --> <name>fs.trash.checkpoint.interval</name> <value>0</value> </property> <property> <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 --> <name>fs.trash.interval</name> <value>10080</value> </property> <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 --> <property> <name>hadoop.tmp.dir</name> <value>/home/data/tmp/hadoop</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value> </property> <!-- 指定ZooKeeper超时间隔,单位毫秒 --> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>2000</value> </property> <!-- hadoop进程运行的用户 --> <property> <name>hadoop.proxyuser.${进程用户}.hosts</name> <value>*</value> </property> <!-- hadoop进程运行的用户组 --> <property> <name>hadoop.proxyuser.${进程用户}.groups</name> <value>*</value> </property> <!-- 压缩 --> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec </value> </property> </configuration>
- mapred-site.xml:
-
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 配置 MapReduce Applications --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- JobHistory Server ============================================================== --> <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 --> <property> <name>mapreduce.jobhistory.address</name> <value>bigdata01:10020</value> </property> <!-- 配置 MapReduce JobHistory Server Web UI 地址, 默认端口19888 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>bigdata01:19888</value> </property> <!-- 配置 Map段输出的压缩————snappy--> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> </configuration>
- yarn-site.xml:
-
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- nodemanager 配置 ================================================= --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.localizer.address</name> <value>0.0.0.0:23344</value> <description>Address where the localizer IPC is.</description> </property> <property> <name>yarn.nodemanager.webapp.address</name> <value>0.0.0.0:23999</value> <description>NM Webapp address.</description> </property> <!-- HA 配置 =============================================================== --> <!-- Resource Manager Configs --> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing --> <property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property> <!-- 集群名称,确保HA选举时对应的集群 --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!--这里RM主备结点需要单独指定,(可选) <property> <name>yarn.resourcemanager.ha.id</name> <value>rm2</value> </property> --> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <!-- ZKRMStateStore 配置 --> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value> </property> <property> <name>yarn.resourcemanager.zk.state-store.address</name> <value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value> </property> <!-- Client访问RM的RPC地址 (applications manager interface) --> <property> <name>yarn.resourcemanager.address.rm1</name> <value>bigdata01:23140</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>bigdata02:23140</value> </property> <!-- AM访问RM的RPC地址(scheduler interface) --> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>bigdata01:23130</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>bigdata02:23130</value> </property> <!-- RM admin interface --> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>bigdata01:23141</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>bigdata02:23141</value> </property> <!--NM访问RM的RPC端口 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>bigdata01:23125</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>bigdata02:23125</value> </property> <!-- RM web application 地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>bigdata01:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>bigdata02:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm1</name> <value>bigdata01:23189</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm2</name> <value>bigdata02:23189</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://bigdata01:19888/jobhistory/logs</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>16</value> <discription>单个任务可申请最少内存,默认1024MB</discription> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>1024</value> <discription>单个任务可申请最大内存,默认8192MB</discription> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property> </configuration>
- slaves:
-
bigdata01 bigdata02 bigdata03
- 这里注意上传的slaves文件不符合unix格式,多了一些换行符,需要转换。
- 清理多余字符:
dos2unix slaves
3.4 hadoop初始化
- 启动jn日志节点:
sh /home/XXX/app/hadoop/sbin/hadoop-daemon.sh start journalnode
- 格式化bigdata01的NameNode:
sh /home/XXX/app/hadoop/bin/hadoop namenode -format
- 将bigdata01的元数据传给bigdata02:
- scp -r /home/XXX/data/dfs XXX@bigdata02:/home/XXX/data
- zkfc初始化:
sh /home/hadoop/app/hadoop/sbin/hdfs zkfc -formatZK
3.5 启动Hadoop
- 启动hdfs:
sh /home/XXX/app/hadoop/sbin/start-dfs.sh
- 启动yarn:
- bigdata01:
sh /home/XXX/app/hadoop/sbin/start-yarn.sh
- bigdata02:
sh /home/XXX/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
- bigdata01:
- 启动historyserver:
sh /home/XXX/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
四、常用脚本及命令
1、启动集群
[XXX@bigdata01 ~]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[XXX@bigdata02 ~]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[XXX@bigdata03 ~]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[XXX@bigdata01 ~]# $HADOOP_HOME/sbin/start-all.sh
[XXX@bigdata02 ~]# $HADOOP_HOME/sbin/yarn-daemon.sh start resourcemanager
[XXX@bigdata01 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
2、关闭集群
[XXX@bigdata01 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver
[XXX@bigdata02 ~]# $HADOOP_HOME /sbin/yarn-daemon.sh stop resourcemanager
[XXX@bigdata01 ~]# $HADOOP_HOME /sbin/stop-all.sh
[XXX@bigdata01 ~]# $ZOOKEEPER_HOME /bin/zkServer.sh stop
[XXX@bigdata02 ~]# $ZOOKEEPER_HOME /bin/zkServer.sh stop
[XXX@bigdata03 ~]# $ZOOKEEPER_HOME /bin/zkServer.sh stop
3、监控集群
hdfs dfsadmin -report
web 界面
- HDFS:http://bigdata01:50070/
- HDFS:http://bigdata02:50070/
- ResourceManger(Active):http://bigdata01:8088
- ResourceManger(Standby):http://bigdata02/cluster/cluster
- JobHistory:http://bigdata01/jobhistory
4、单个进程启动/关闭
hadoop-daemon.sh start|stop namenode|datanode| journalnode|zkfc
yarn-daemon.sh start |stop resourcemanager|nodemanager
5、测试
hdfs haadmin -failover nn1 nn2