Linux环境Hadoop HA安装配置

309 阅读5分钟

一、版本

组件名版本备注
CentOS7.6.1810 64-bitcat /etc/redhat-release 命令查看操作系统版本
file /bin/ls 命令查看操作系统位数
JREjava version "1.8.0_144需要java 1.8 或者 java 11
Hadoophadoop-2.6.0-cdh5.16.2.tar.gzhdfs+mapreduce+yarn
Zookeeperzookeeper-3.4.5-cdh5.16.2.tar.gzHDFS、Yarn 使用的协调服务

二、主机规划

IPHost安装软件进程
192.168.188.XXXbigdata01Hadoop
ZooKeeper
NameNode
DFSZKFailoverController
JournalNode
DataNode
ResourceManager
JobHistoryServer
NodeManager
QuorumPeerMain
192.168.188.XXXbigdata02Hadoop
ZooKeeper
NameNode
DFSZKFailoverController
JournalNode
DataNode
ResourceManager
NodeManager
QuorumPeerMain
192.168.188.XXXbigdata03Hadoop
ZooKeeper
JournalNode
DataNode
NodeManager
QuorumPeerMain

三、目录规划

名称路径备注
$HADOOP_HOME/home/XXX/app/hadoop/
Data/home/XXX/data/
Log/home/XXX/log/
hadoop.tmp.dir/home/XXX/tmp/hadoop/需要创建,权限777
$ZOOKEEPER_HOME/home/XXX/app/zookeeper/

四、Hadoop HA集群部署

准备工作

配置信任关系

  • 在各个节点生成配置信任关系的key:ssh-kengen
  • 以bigdata01为主,配置信任关系。在bigdata02和bigdata03中把~/.ssh下的id_rsa.pub文件传给bigdata01并重命名:
    • scp ~/.ssh/id_rsa.pub root@bigdata01:~/.ssh/id_rsa.pub2
      scp ~/.ssh/id_rsa.pub root@bigdata01:~/.ssh/id_rsa.pub3
      
  • 在bigdata01下,将三个id_rsa.pub文件合并为authorized_keys,并赋予authorized_keys 600的权限:
    • cat id_rsa.pub >> authorized_keys
      cat id_rsa.pub2 >> authorized_keys
      cat id_rsa.pub3 >> authorized_keys
      chmod 600 authorized_keys
      
  • 将authorized_keys文件传给bigdata02和bigdata03(root下执行):
    • scp /home/XXX/.ssh/authorized_keys bigdata02:/home/XXX/.ssh/
      scp /home/XXX/.ssh/authorized_keys bigdata03:/home/XXX/.ssh/
      
  • 分别在bigdata02和bigdata03的root用户下将authorized_keys 赋给XXX用户:
    • chown XXX:XXX /home/XXX/.ssh/authorized_keys
      
  • 初始化操作,在known_hosts文件中有记录,用于检验信任关系:
    • ssh bigdata01 date
      ssh bigdata02 date
      ssh bigdata03 date
      
  • 注意:
    • 如果输入yes之后,还出现要输入密码,检查配置操作问题
    • 当信任关系中的机器发生变化时,需要重新配置信任关系。known_hosts和authorized_keys中相应的机器信息要全部删除。

1、安装jdk

  • 各个节点都要执行

1.1 创建文件夹并解压jdk:

  • mkdir /usr/java
  • tar -xzvf jdk-XXXX.gz -C /usr/java/
  • chown -R root:root /usr/java/jdk-XXXX

1.2 配置JAVA环境变量

  • 注意要将其配置在%PATH的前面,否则可能会出现冲突。
    • sudo vi /etc/profile
    • export JAVA_HOME=/usr/java/jdk-XXXX
      export PATH=$JAVA_HOME/bin:$PATH
      
    • source /etc/profile
    • 验证:which java

2、zookeeper部署

2.1 解压zookeeper

  • tar -xzvf zookeeper-3.4.5-cdh5.16.2.tar.gz -C ~/app/

2.2 创建软连接

  • 方便后续版本迭代
  • ln -s zookeeper-3.4.5-cdh5.16.2 zookeeper

2.3 配置环境变量

  • 环境变量:vi ~/.bashrc
    • export ZOOKEEPER_HOME=/home/XXX/app/zookeeper
      export PATH=${ZOOKEEPER_HOME}/bin:$PATH
      
    • source /etc/profile

2.4 修改配置文件

  • vi zoo.cfg
    • 修改dataDir路径:dataDir=/home/XXX/tmp/zookeeper
    • 增加三台机器对zookeeper的协议端口号:
      • server.1=hadoop001:2888:3888
        server.2=hadoop002:2888:3888
        server.3=hadoop003:2888:3888
        
  • 创建zookeeper集群各个节点的唯一标识:
  • touch /home/hadoop/tmp/zookeeper/myid
  • 各个节点分别执行:
    • echo 1 > /home/XXX/tmp/zookeeper/myid
      echo 2 > /home/XXX/tmp/zookeeper/myid
      echo 3 > /home/XXX/tmp/zookeeper/myid
      

2.5 zookeeper操作:

  • 启动zookeeper服务
    • 各个节点服务启动前,处于not running状态
    • sh ~/app/zookeeper/bin/zkServer.sh start
  • 查看服务状态
    • sh ~/app/zookeeper/bin/zkServer.sh status
  • 关闭服务
    • sh ~/app/zookeeper/bin/zkServer.sh stop

3、hadoop部署

3.1 解压hadoop

  • tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C /home/hadoop/app/

3.2 配置软连接

  • ln -s hadoop-2.6.0-cdh5.16.2 hadoop

3.3 配置环境变量

  • 环境变量:vi ~/.bashrc
    • export HADOOP_HOME=/home/XXX/app/hadoop
      export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
      
    • source /etc/profile

3.3 自定义配置文件

  • 这里给出参考,根据生产实际情况自行修改
  • vi hadoop-env.sh
    • # hadoop 2.x 这里必须修改,3.x修复
      export JAVA_HOME=本地JDK路径
      export HADOOP_PID_DIR=/home/XXX/tmp
      
  • hdfs-site.xml:
  • <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <!-- HDFS超级用户 -->
        <property>
            <name>dfs.permissions.superusergroup</name>
            <value>data</value>
        </property>
    
        <!-- 开启web hdfs -->
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/data/data/dfs/name</value>
            <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
        </property>
        <property>
            <name>dfs.namenode.edits.dir</name>
            <value>${dfs.namenode.name.dir}</value>
            <description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/data/data/dfs/data</value>
            <description>datanode存放block本地目录(需要修改)</description>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <!-- 块大小128M (默认128M) -->
        <property>
            <name>dfs.blocksize</name>
            <value>134217728</value>
        </property>
        <!--======================================================================= -->
        <!-- HDFS高可用配置 -->
        <!-- 指定hdfs的nameservice为bigdata,需要和core-site.xml中的保持一致 -->
        <property>
            <name>dfs.nameservices</name>
            <value>bigdata</value>
        </property>
        <property>
            <!-- 设置NameNode IDs 此版本最大只支持两个NameNode -->
            <name>dfs.ha.namenodes.bigdata</name>
            <value>nn1,nn2</value>
        </property>
    
        <!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
        <property>
            <name>dfs.namenode.rpc-address.bigdata.nn1</name>
            <value>bigdata01:8020</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.bigdata.nn2</name>
            <value>bigdata02:8020</value>
        </property>
    
        <!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
        <property>
            <name>dfs.namenode.http-address.bigdata.nn1</name>
            <value>bigdata01:50070</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.bigdata.nn2</name>
            <value>bigdata02:50070</value>
        </property>
    
        <!-- ==================Namenode editlog同步 ============================================ -->
        <!-- 保证数据恢复 -->
        <property>
            <name>dfs.journalnode.http-address</name>
            <value>0.0.0.0:8480</value>
        </property>
        <property>
            <name>dfs.journalnode.rpc-address</name>
            <value>0.0.0.0:8485</value>
        </property>
        <property>
            <!-- 设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
            <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://bigdata01:8485;bigdata02:8485;bigdata03:8485/bigdata</value>
        </property>
    
        <property>
            <!--JournalNode存放数据地址 -->
            <name>dfs.journalnode.edits.dir</name>
            <value>/home/data/data/dfs/jn</value>
        </property>
        <!--==================DataNode editlog同步 ============================================ -->
        <property>
            <!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
            <!-- 配置失败自动切换实现方式 -->
            <name>dfs.client.failover.proxy.provider.bigdata</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!--==================Namenode fencing:=============================================== -->
        <!--Failover后防止停掉的Namenode启动,造成两个服务 -->
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/home/data/.ssh/id_rsa</value>
        </property>
        <property>
            <!--多少milliseconds 认为fencing失败 -->
            <name>dfs.ha.fencing.ssh.connect-timeout</name>
            <value>30000</value>
        </property>
    
        <!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
        <!-- 开启基于Zookeeper  -->
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <!--动态许可datanode连接namenode列表 -->
         <property>
             <name>dfs.hosts</name>
             <value>/home/data/app/hadoop/etc/hadoop/slaves</value>
         </property>
    
        <property>
            <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
            <value>false</value>
        </property>
    
    </configuration>
    
  • core-site.xml:
  • <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
            <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://bigdata</value>
            </property>
            <!--==============================Trash机制======================================= -->
            <property>
                    <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
                    <name>fs.trash.checkpoint.interval</name>
                    <value>0</value>
            </property>
            <property>
                    <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
                    <name>fs.trash.interval</name>
                    <value>10080</value>
            </property>
    
             <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/home/data/tmp/hadoop</value>
            </property>
    
             <!-- 指定zookeeper地址 -->
            <property>
                    <name>ha.zookeeper.quorum</name>
                    <value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value>
            </property>
             <!-- 指定ZooKeeper超时间隔,单位毫秒 -->
            <property>
                    <name>ha.zookeeper.session-timeout.ms</name>
                    <value>2000</value>
            </property>
            <!-- hadoop进程运行的用户 -->
            <property>
               <name>hadoop.proxyuser.${进程用户}.hosts</name>
               <value>*</value>
            </property>
            <!-- hadoop进程运行的用户组 -->
            <property>
                <name>hadoop.proxyuser.${进程用户}.groups</name>
                <value>*</value>
            </property>
    
            <!-- 压缩 -->
            <property>
              <name>io.compression.codecs</name>
              <value>org.apache.hadoop.io.compress.GzipCodec,
                org.apache.hadoop.io.compress.DefaultCodec,
                org.apache.hadoop.io.compress.BZip2Codec,
                org.apache.hadoop.io.compress.SnappyCodec
              </value>
            </property>
    </configuration>
    
  • mapred-site.xml:
  • <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <!-- 配置 MapReduce Applications -->
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <!-- JobHistory Server ============================================================== -->
        <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>bigdata01:10020</value>
        </property>
        <!-- 配置 MapReduce JobHistory Server Web UI 地址, 默认端口19888 -->
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>bigdata01:19888</value>
        </property>
    
        <!-- 配置 Map段输出的压缩————snappy-->
        <property>
            <name>mapreduce.map.output.compress</name>
            <value>true</value>
        </property>
    
        <property>
            <name>mapreduce.map.output.compress.codec</name>
            <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        </property>
    </configuration>
    
  • yarn-site.xml:
  • <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <!-- nodemanager 配置 ================================================= -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.nodemanager.localizer.address</name>
            <value>0.0.0.0:23344</value>
            <description>Address where the localizer IPC is.</description>
        </property>
        <property>
            <name>yarn.nodemanager.webapp.address</name>
            <value>0.0.0.0:23999</value>
            <description>NM Webapp address.</description>
        </property>
    
        <!-- HA 配置 =============================================================== -->
        <!-- Resource Manager Configs -->
        <property>
            <name>yarn.resourcemanager.connect.retry-interval.ms</name>
            <value>2000</value>
        </property>
        <property>
            <name>yarn.resourcemanager.ha.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
        <property>
            <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
            <value>true</value>
        </property>
        <!-- 集群名称,确保HA选举时对应的集群 -->
        <property>
            <name>yarn.resourcemanager.cluster-id</name>
            <value>yarn-cluster</value>
        </property>
        <property>
            <name>yarn.resourcemanager.ha.rm-ids</name>
            <value>rm1,rm2</value>
        </property>
    
    
        <!--这里RM主备结点需要单独指定,(可选)
        <property>
             <name>yarn.resourcemanager.ha.id</name>
             <value>rm2</value>
         </property>
         -->
    
        <property>
            <name>yarn.resourcemanager.scheduler.class</name>
            <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.recovery.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
            <value>5000</value>
        </property>
        <!-- ZKRMStateStore 配置 -->
        <property>
            <name>yarn.resourcemanager.store.class</name>
            <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
        </property>
        <property>
            <name>yarn.resourcemanager.zk-address</name>
            <value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value>
        </property>
        <property>
            <name>yarn.resourcemanager.zk.state-store.address</name>
            <value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value>
        </property>
        <!-- Client访问RM的RPC地址 (applications manager interface) -->
        <property>
            <name>yarn.resourcemanager.address.rm1</name>
            <value>bigdata01:23140</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address.rm2</name>
            <value>bigdata02:23140</value>
        </property>
        <!-- AM访问RM的RPC地址(scheduler interface) -->
        <property>
            <name>yarn.resourcemanager.scheduler.address.rm1</name>
            <value>bigdata01:23130</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address.rm2</name>
            <value>bigdata02:23130</value>
        </property>
        <!-- RM admin interface -->
        <property>
            <name>yarn.resourcemanager.admin.address.rm1</name>
            <value>bigdata01:23141</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address.rm2</name>
            <value>bigdata02:23141</value>
        </property>
        <!--NM访问RM的RPC端口 -->
        <property>
            <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
            <value>bigdata01:23125</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
            <value>bigdata02:23125</value>
        </property>
        <!-- RM web application 地址 -->
        <property>
            <name>yarn.resourcemanager.webapp.address.rm1</name>
            <value>bigdata01:8088</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address.rm2</name>
            <value>bigdata02:8088</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.https.address.rm1</name>
            <value>bigdata01:23189</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.https.address.rm2</name>
            <value>bigdata02:23189</value>
        </property>
    
    
        <property>
           <name>yarn.log-aggregation-enable</name>
           <value>true</value>
        </property>
        <property>
             <name>yarn.log.server.url</name>
             <value>http://bigdata01:19888/jobhistory/logs</value>
        </property>
    
    
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>2048</value>
        </property>
        <property>
            <name>yarn.scheduler.minimum-allocation-mb</name>
            <value>16</value>
            <discription>单个任务可申请最少内存,默认1024MB</discription>
         </property>
    
    
      <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>1024</value>
        <discription>单个任务可申请最大内存,默认8192MB</discription>
      </property>
    
       <property>
           <name>yarn.nodemanager.resource.cpu-vcores</name>
           <value>2</value>
        </property>
    
    </configuration>
    
  • slaves:
  • bigdata01
    bigdata02
    bigdata03
    
    • 这里注意上传的slaves文件不符合unix格式,多了一些换行符,需要转换。
    • 清理多余字符:dos2unix slaves

3.4 hadoop初始化

  • 启动jn日志节点:
    • sh /home/XXX/app/hadoop/sbin/hadoop-daemon.sh start journalnode
  • 格式化bigdata01的NameNode:
    • sh /home/XXX/app/hadoop/bin/hadoop namenode -format
  • 将bigdata01的元数据传给bigdata02:
    • scp -r /home/XXX/data/dfs XXX@bigdata02:/home/XXX/data
  • zkfc初始化:
    • sh /home/hadoop/app/hadoop/sbin/hdfs zkfc -formatZK

3.5 启动Hadoop

  • 启动hdfs:
    • sh /home/XXX/app/hadoop/sbin/start-dfs.sh
  • 启动yarn:
    • bigdata01:sh /home/XXX/app/hadoop/sbin/start-yarn.sh
    • bigdata02:sh /home/XXX/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
  • 启动historyserver:
    • sh /home/XXX/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver

四、常用脚本及命令

1、启动集群

[XXX@bigdata01 ~]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[XXX@bigdata02 ~]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[XXX@bigdata03 ~]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[XXX@bigdata01 ~]# $HADOOP_HOME/sbin/start-all.sh
[XXX@bigdata02 ~]# $HADOOP_HOME/sbin/yarn-daemon.sh start resourcemanager
[XXX@bigdata01 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

2、关闭集群

[XXX@bigdata01 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver
[XXX@bigdata02 ~]# $HADOOP_HOME /sbin/yarn-daemon.sh stop resourcemanager
[XXX@bigdata01 ~]# $HADOOP_HOME /sbin/stop-all.sh
[XXX@bigdata01 ~]# $ZOOKEEPER_HOME /bin/zkServer.sh stop
[XXX@bigdata02 ~]# $ZOOKEEPER_HOME /bin/zkServer.sh stop
[XXX@bigdata03 ~]# $ZOOKEEPER_HOME /bin/zkServer.sh stop

3、监控集群

  • hdfs dfsadmin -report

web 界面

4、单个进程启动/关闭

  • hadoop-daemon.sh start|stop namenode|datanode| journalnode|zkfc
  • yarn-daemon.sh start |stop resourcemanager|nodemanager

5、测试

  • hdfs haadmin -failover nn1 nn2