【误闯大数据】Hadoop部署

416 阅读13分钟
作者 日期 天气
元公子 2020-01-25(周六) 降温的东莞

你知道的越少,你不知道的就越少

没有朋友的点赞,就没法升级打怪

一.序言

断断续续写了一周,终于弄完了此篇文章。由于涉及机器较多,边部署边测试,生产效率有些低下。。。

下面概括记录下hadoop中各项服务的用途定义:

NameNode:主要是获取客户端的读写服务,保存HDFS的元数据信息,fsimage和edit文件等。

SecondaryNameNode:提高NameNode启动速度、备份fsimage,定期合并editlog到fsimage文件。HA中不需要此进程

DataNode:提供真实文件数据的存储服务,通过文件块(block最基本的存储单位)来存储。dfs.block.size参数

JournalNode: HA中多个NameNode 之间共享editlog数据。数据放在对应的机器磁盘上。注意:必须允许至少3个节点。当然可以运行更多,但是必须是奇数个,如3、5、7、9个等等。当运行N个节点时,系统可以容忍至少(N-1)/2,(N至少为3)个节点失败而不影响正常运行。

ZK Failover Controller:及时检测NameNode的健康状况,对 NameNode 的主备切换进行总体控制,借助Zookeeper实现自动的主备选举和切换。

ResourceManager:是Yarn集群主控节点,负责协调和管理整个集群(所有NodeManager)的资源。

NodeManager:是运行在单个节点上的代理,它管理Hadoop集群中单个计算节点,功能包括与ResourceManager保持通信,管理Container的生命周期、监控每个Container的资源使用(内存、CPU等)情况、追踪节点健康状况、管理日志和不同应用程序用到的附属服务等。

JobHistoryServer:查看已经运行完的Mapreduce作业记录。

二.环境准备

  • 示例使用Centos7 64位操作系统
  • Java jdk1.8以上环境
  • 已存在zookeeper集群环境,部署文章
  • 安装软件使用hadoop用户
  • 文中涉及的内容可以查询这两篇文章常用脚本篇常用环境篇
  • 至少准备5台服务器,域名分别设置为hadoop-master,hadoop-master-standby1,hadoop-dn1,hadoop-dn2,hadoop-dn3。

三.下载安装包

官方地址:前往下载

下载最新版软件:hadoop-3.2.1.tar.gz

四.开始烧脑

公共的部分

  • 安装包放置在根路径下的soft目录下,使用root用户解压缩。
[root@hadoop-master /]# mkdir /soft
[root@hadoop-master /]# chown -R hadoop:hadoop /soft
[root@hadoop-master /]# cd /soft
[root@hadoop-master /soft]# tar -xvzf hadoop-3.2.1.tar.gz 
  • 设置用户组和创建软链接。
[root@hadoop-master /soft]# chown -R hadoop:hadoop hadoop-3.2.1
[root@hadoop-master /soft]# ln -s hadoop-3.2.1 hadoop
  • 设置环境变量。您要清楚已经配置了JDK环境变量。
[root@hadoop-master /soft]# vi /etc/profile
export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
执行保存: Esc :wq
[root@hadoop-master /soft]# source /etc/profile
  • 设置自动同步更新系统时间,保证服务器时间全部正确。
[root@hadoop-master /root]# date
2020年 01月 23日 星期四 01:08:37 CST
[root@hadoop-master /root]# yum -y install ntpdate
[root@hadoop-master /root]# sudo ntpdate 0.asia.pool.ntp.org
23 Jan 01:09:45 ntpdate[17905]: adjust time server 202.28.93.5 offset -0.093354 sec
# 每小时同步一次
[root@hadoop-master /root]# sudo vi /etc/crontab
* */1 * * * /usr/sbin/ntpdate 0.asia.pool.ntp.org;/sbin/hwclock -w 
  • 切换hadoop用户,设置免登录信息,扩展信息可参阅此篇文章
[root@hadoop-master /soft]# su - hadoop
[hadoop@hadoop-master /home/hadoop]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[hadoop@hadoop-master /home/hadoop]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop-master /home/hadoop]$ chmod 0600 ~/.ssh/authorized_keys

对于官网描述的三种安装模式,这里采用软链接配置文件夹的方式。在/soft/hadoop/etc目录下创建三个文件夹local(本地模式)、pesudo(伪分布式)和full(集群),通过软链接命名hadoop,实现三种模式的无缝切换。如果在生产环境,您就不必这么麻烦。按自己所需,直接修改/soft/hadoop/etc/hadoop目录下的配置文件即可。

[hadoop@hadoop-master /home/hadoop]$ mkdir  /soft/hadoop/etc/local
[hadoop@hadoop-master /home/hadoop]$ mkdir  /soft/hadoop/etc/pesudo
[hadoop@hadoop-master /home/hadoop]$ mkdir  /soft/hadoop/etc/full
# 把hadoop目录下的配置文件拷贝到本地模式和伪分布式文件夹下,集群的配置会在伪分布式的基础上扩展,暂时不用拷贝。
[hadoop@hadoop-master /home/hadoop]$ cp -fr /soft/hadoop/etc/hadoop/* /soft/hadoop/etc/local/
[hadoop@hadoop-master /home/hadoop]$ cp -fr /soft/hadoop/etc/hadoop/* /soft/hadoop/etc/pesudo/
# 删除配置文件夹
[hadoop@hadoop-master /home/hadoop]$ rm -fr /soft/hadoop/etc/hadoop/

(一)本机模式

这种模式比较简单,依据官网描述,什么都不需要修改。运行单个Java进程,用于调试程序。

# 配置文件使用默认的local
[hadoop@hadoop-master /home/hadoop]$ ln -fsT /soft/hadoop/etc/local /soft/hadoop/etc/hadoop
[hadoop@hadoop-master /home/hadoop]$ cd /soft/hadoop
[hadoop@hadoop-master /soft/hadoop]$ mkdir input
[hadoop@hadoop-master /soft/hadoop]$ cp etc/hadoop/*.xml input
[hadoop@hadoop-master /soft/hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
[hadoop@hadoop-master /soft/hadoop]$ cat output/*
1	dfsadmin

(二)伪分布式模式

此方案常用于本地开发程序。一台机器上启动NameNode、DataNode和Yarn。

切换为pesudo环境配置

[hadoop@hadoop-master /home/hadoop]$ ln -fsT /soft/hadoop/etc/pesudo /soft/hadoop/etc/hadoop

修改文件core-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/core-site.xml
# <configuration>标签中增加下面信息
    <property>
        <name>fs.defaultFS</name>
        <!-- 如果端口冲突,请自行调整,默认9000 -->
        <value>hdfs://hadoop-master:9000</value>
    </property>
    <property>
        <!-- 基础的临时目录。比如yarn临时数据也会在此目录下yarn/yarn_data -->
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hadoop-pesudo/tmp</value>
    </property>
    <!-- hive beeline 可登录 -->
    <property>
       <name>hadoop.proxyuser.hadoop.hosts</name>
       <value>*</value>
    </property>
    <property>
       <name>hadoop.proxyuser.hadoop.groups</name>
       <value>*</value>
    </property>
执行保存: Esc :wq

修改文件hdfs-site.xml,数据解决

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/hdfs-site.xml
# <configuration>标签中增加下面信息
    <property>
    	<!-- 指定副本数:不要超过datanode节点数量(默认3)-->
        <name>dfs.replication</name>
        <value>1</value>
    </property>
执行保存: Esc :wq

修改文件mapred-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/mapred-site.xml
# <configuration>标签中增加下面信息
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
执行保存: Esc :wq

修改文件yarn-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/yarn-site.xml
# <configuration>标签中增加下面信息
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name> 
         <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
执行保存: Esc :wq

设置环境变量:JAVA_HOME(hadoop用户下启动会报错)、HADOOP_LOG_DIR和HADOOP_PID_DIR。

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/hadoop-env.sh
# JAVA_HOME,找到JAVA_HOME变量,在下面追加一行
# export JAVA_HOME=
export JAVA_HOME=/soft/jdk
# 日志目录,找到HADOOP_LOG_DIR变量,在下面追加一行
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_LOG_DIR=/home/hadoop/hadoop-pesudo/logs
# 进程目录,找到HADOOP_PID_DIR变量,在下面追加一行
# export HADOOP_PID_DIR=/tmp
export HADOOP_PID_DIR=/home/hadoop/hadoop-pesudo/pids
执行保存: Esc :wq

配置基本完成,开始启动测试。

[hadoop@dev /home/hadoop]$ hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /soft/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
# 格式化,生成NameNode元数据,/home/hadoop/hadoop/tmp目录下。
[hadoop@hadoop-master /soft]$ hdfs namenode -format
# 启动hadoop
[hadoop@hadoop-master /soft]$ start-dfs.sh
[hadoop@hadoop-master /soft]$ jps
16773 NameNode
16918 DataNode
17099 SecondaryNameNode
# 启动yarn框架
[hadoop@hadoop-master /soft]$ start-yarn.sh
[hadoop@hadoop-master /soft]$ jps
17471 NodeManager
17349 ResourceManager
# 查看端口
[hadoop@hadoop-master /soft]$ netstat -ntlvp | grep 8088
tcp  0     0 0.0.0.0:8088           0.0.0.0:*     LISTEN      17349/java
[hadoop@hadoop-master /soft]$ netstat -ntlvp | grep 9870
tcp  0     0 0.0.0.0:9870          	0.0.0.0:*     LISTEN      16773/java
# 如果访问不了,请先排查防火墙
# HDFS webui 默认9870
http://ip:9870
# yarn webui 默认8088
http://ip:8088
# 停止的脚本
[hadoop@dev /soft]$ stop-yarn.sh
[hadoop@dev /soft]$ stop-dfs.sh

这阶段部署后,就可以通过框架配置hdfs://hadoop-master:9000信息,开发大数据程序了。

(三)高可用模式(HA+Federation+YarnHA)

此方案用于生产环境,主要目标为了提升计算性能,防止单点故障等情况。在federation体系中,多个NameNode之间(hadoop-master和hadoop-master2的派系)互不通信,但是不同命名空间(nns1和nns2)管理相同的dataNode。

建议在实际部署时,先只部署一套HA(NameNode)即可,先不用上Federation方案。即使用5台机器,去掉hadoop-master2和hadoop-master2-standby1机器。

部署所涉及7台机器信息如下:

机器IP 域名 角色
192.168.146.3 hadoop-master NameNode(Active),ZKFailoverController,ResourceManager
192.168.146.4 hadoop-master-standby1 NameNode(Standby),ZKFailoverController,ResourceManager,JobHistoryServer
192.168.146.5 hadoop-master2 NameNode(Active),ZKFailoverController,ResourceManager
192.168.146.6 hadoop-master2-standby1 NameNode(Standby),ZKFailoverController,ResourceManager,JobHistoryServer
192.168.146.7 hadoop-dn1 DataNode, JournalNode, NodeManager,QuorumPeerMain(zookeeper)
192.168.146.8 hadoop-dn2 DataNode, JournalNode,NodeManager, QuorumPeerMain
192.168.146.9 hadoop-dn3 DataNode, JournalNode, NodeManager,QuorumPeerMain

以下操作思路是,先在hadoop-master机器上编辑配置文件,后面一起复制到其它6台机器。其中大部分信息都一样,但请留意hadoop-master2和hadoop-master2-standby1这两台上的修改,因为是不同的NameNode。

从pesudo目录考入full后,hadoop软链接切换为full文件夹

[hadoop@hadoop-master /soft]$ cp -fr /soft/hadoop/etc/pesudo/* /soft/hadoop/etc/full/
[hadoop@hadoop-master /soft]$ ln -fsT /soft/hadoop/etc/full /soft/hadoop/etc/hadoop

删除原先的内容,修改文件core-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/core-site.xml
    <property>
        <name>fs.defaultFS</name>
        <!-- Federation中的一个namenode,注意每个namenode不同 -->
        <value>hdfs://nns1</value>
	<description>缺省文件服务的协议和NS逻辑名称,和hdfs-site里的dfs.nameservices参数对应</description>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hadoop-full/tmp</value>
    </property>
   <!--  配置Zookeeper  -->
    <property>
	<name>ha.zookeeper.quorum</name>
        <value>hadoop-dn1:2181,hadoop-dn2:2181,hadoop-dn3:2181</value>
        <description>指定用于HA的ZooKeeper集群机器列表</description>
    </property>
    <property>
	<name>ha.zookeeper.session-timeout.ms</name>
        <value>5000</value>
        <description>指定ZooKeeper超时间隔,单位毫秒</description>
    </property>
    <property>
  	<name>ipc.client.connect.max.retries</name>
  	<value>100</value>
  	<description>Indicates the number of retries a client will make to establish a server connection.</description>
    </property>
    <property>
  	<name>ipc.client.connect.retry.interval</name>
  	<value>10000</value>
  	<description>Indicates the number of milliseconds a client will wait for before retrying to establish a server connection.</description>
    </property>
    <!-- hive beeline 可登录 -->
    <property>
       <name>hadoop.proxyuser.hadoop.hosts</name>
       <value>*</value>
    </property>
    <property>
       <name>hadoop.proxyuser.hadoop.groups</name>
       <value>*</value>
    </property>
执行保存: Esc :wq

删除原先的内容,修改文件hdfs-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/hdfs-site.xml
    <property>
    	<!-- 会用到3个DataNode节点 -->
        <name>dfs.replication</name>
        <value>3</value>
    </property>
   <!--  ##############################  -->
   <!--  配置NameService和NameNodeID  -->
    <property>
  	<name>dfs.nameservices</name>
  	<!--  2个NameNode做Federation  -->
  	<value>nns1,nns2</value>
	<description>提供服务的NameService逻辑名称,与core-site.xml里的对应</description>
    </property>
    <property>
  	<name>dfs.ha.namenodes.nns1</name>
  	<!-- 官翻:用于HA的NameNode的最小数量为2,但是您可以配置更多。由于通信开销,建议不要超过5个(建议使用3个NameNode) -->
  	<value>nn1,nn2</value>
	<description>列出该逻辑名称下的NameNode逻辑名称,至少2个节点</description>
    </property>
    <property>
  	<name>dfs.ha.namenodes.nns2</name>
  	<value>nn1,nn2</value>
	<description>列出该逻辑名称下的NameNode逻辑名称,至少2个节点</description>
    </property>
   <!--  ##############################  -->
    <property>
  	<name>dfs.namenode.rpc-address.nns1.nn1</name>
  	<value>hadoop-master:9000</value>
	<description>指定第一个NameNode的RPC位置</description>
    </property>
    <property>
  	<name>dfs.namenode.http-address.nns1.nn1</name>
  	<value>hadoop-master:50070</value>
	<description>指定第一个NameNode的Web Server位置</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.nns1.nn2</name>
        <value>hadoop-master-standby1:9000</value>
        <description>指定第二个NameNode的RPC位置</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.nns1.nn2</name>
        <value>hadoop-master-standby1:50070</value>
        <description>指定第二个NameNode的Web Server位置</description>
    </property>
 
     <property>
  	<name>dfs.namenode.rpc-address.nns2.nn1</name>
  	<value>hadoop-master2:9000</value>
	<description>指定第一个NameNode的RPC位置</description>
    </property>
    <property>
  	<name>dfs.namenode.http-address.nns2.nn1</name>
  	<value>hadoop-master2:50070</value>
	<description>指定第一个NameNode的Web Server位置</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.nns2.nn2</name>
        <value>hadoop-master2-standby1:9000</value>
        <description>指定第二个NameNode的RPC位置</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.nns2.nn2</name>
        <value>hadoop-master2-standby1:50070</value>
        <description>指定第二个NameNode的Web Server位置</description>
    </property>
    <!--  ##############################  -->

    <!--  配置EditLog的共享目录,注意每个namenode不同  -->
    <property>
  	<name>dfs.namenode.shared.edits.dir</name>
  	<value>qjournal://hadoop-dn1:8485;hadoop-dn2:8485;hadoop-dn3:8485/nns1-ha-data</value>
    </property>

    <!--  配置Zookeeper  -->
    <property>
    	<name>ha.zookeeper.quorum</name>
   	    <value>hadoop-dn1:2181,hadoop-dn2:2181,hadoop-dn3:2181</value>
	    <description>指定用于HA的ZooKeeper集群机器列表</description>
    </property>
    <property>
    	<name>ha.zookeeper.session-timeout.ms</name>
    	<value>5000</value>
    	<description>指定ZooKeeper超时间隔,单位毫秒</description>
    </property>
    <!--指定切换方式为SSH免密钥方式-->
    <property>
    	<name>dfs.ha.fencing.methods</name>
    	<value>sshfence</value>
    	<description>切换到ssh登录到另一台NameNode杀死旧的主进程.缺省是ssh,可设为shell.</description>
    </property>
    <property>
    	<name>dfs.ha.fencing.ssh.private-key-files</name>
    	<value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    <!--  配置JournalNode  -->
    <property>
    	<name>dfs.journalnode.rpc-address</name>
    	<value>0.0.0.0:8485</value>
    </property>
    <property>
    	<name>dfs.journalnode.http-address</name>
    	<value>0.0.0.0:8480</value>
    </property>
    <property>
    	<name>dfs.journalnode.edits.dir</name>
    	<value>/home/hadoop/hadoop-full/tmp/dfs/journal</value>
        <description>JournalNode物理机器上,其他机器不需要</description>
    </property>

    <!-- 客户端所需的配置  -->
    <property>
    	<name>dfs.ha.automatic-failover.enabled</name>
    	<value>true</value>
    	<description>或者false</description>
    </property>
    <property>
    	<name>dfs.client.failover.proxy.provider.nns1</name>
    	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    	<description>指定客户端用于HA切换的代理类,不同的NS可以用不同的代理类
        以上示例为Hadoop 2.0自带的缺省代理类</description>
    </property>
    <property>
    	<name>dfs.client.failover.proxy.provider.nns2</name>
    	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    	<description>指定客户端用于HA切换的代理类,不同的NS可以用不同的代理类
        以上示例为Hadoop 2.0自带的缺省代理类</description>
    </property>
执行保存: Esc :wq

删除原先的内容,修改文件yarn-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/yarn-site.xml
    <property>
	    <name>yarn.resourcemanager.ha.enabled</name>
	    <value>true</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.cluster-id</name>
  	    <value>yarn1-ha</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.ha.rm-ids</name>
  	    <value>rm1,rm2</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.hostname.rm1</name>
  	    <value>hadoop-master</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.hostname.rm2</name>
  	    <value>hadoop-master-standby1</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>hadoop-master:8050</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
	    <value>hadoop-master-standby1:8050</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.webapp.address.rm1</name>
  	    <value>hadoop-master:8088</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.webapp.address.rm2</name>
  	    <value>hadoop-master-standby1:8088</value>
    </property>
    <property>
	    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>hadoop-master:8025</value>
    </property>
    <property>
	    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>hadoop-master-standby1:8025</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>hadoop-master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>hadoop-master-standby1:8030</value>
    </property>
    <property>
  	    <name>yarn.resourcemanager.zk-address</name>
  	    <value>hadoop-dn1:2181,hadoop-dn2:2181,hadoop-dn3:2181</value>
    </property>
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    <property>
	    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
	    <value>true</value>
	    <description>Enable automatic failover; By default, it is enabled only when HA is enabled.</description>
    </property>
    <property>
  	    <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
  	    <value>/yarn-leader-election</value>
	    <description>Optional setting. The default value is /yarn-leader-election</description>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
   <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8182</value>
        <discription>每个节点可用内存,单位MB,默认8182MB</discription>
   </property>
   <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <property> 
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property>
	<!-- 日志保留时间设置7天 -->
    <property>
    	<name>yarn.log-aggregation.retain-seconds</name>
    	<value>604800</value>
    </property>
 执行保存: Esc :wq

删除原先的内容,修改文件mapred-site.xml

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/mapred-site.xml
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
    <property>  
        <name>mapreduce.jobhistory.address</name>  
        <value>hadoop-master-standby1:10020</value>  
        <description>MapReduce JobHistory Server host:port.Default port is 10020.</description>  
    </property>
    <property>  
        <name>mapreduce.jobhistory.webapp.address</name> 
        <value>hadoop-master-standby1:19888</value>  
        <description>MapReduce JobHistory Server Web UI host:port.Default port is 19888.</description>  
    </property>
    <property>  
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/home/hadoop/hadoop-full/mr_history/mapred/tmp</value>  
        <description>MapReduce作业产生的日志存放位置,默认值:/mr-history/tmp</description>
    </property>  
    <property>  
        <name>mapreduce.jobhistory.done-dir</name>  
        <value>/home/hadoop/hadoop-full/mr_history/done</value>  
        <description>MR JobHistory Server管理的日志的存放位置,默认:/mr-history/done</description> 
    </property>
 执行保存: Esc :wq

增加DataNode节点,修改文件workers

[hadoop@hadoop-master /soft]$ vi /soft/hadoop/etc/hadoop/workers
hadoop-dn1
hadoop-dn2
hadoop-dn3
执行保存: Esc :wq

设置环境变量:JAVA_HOME(hadoop用户下启动会报错)、HADOOP_LOG_DIR和HADOOP_PID_DIR。

[hadoop@hadoop-master /home/hadoop]$ vi /soft/hadoop/etc/hadoop/hadoop-env.sh
# JAVA_HOME,找到JAVA_HOME变量,在下面追加一行
# export JAVA_HOME=
export JAVA_HOME=/soft/jdk
# 日志目录,找到HADOOP_LOG_DIR变量,在下面追加一行
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_LOG_DIR=/home/hadoop/hadoop-full/logs
# 进程目录,找到HADOOP_PID_DIR变量,在下面追加一行
# export HADOOP_PID_DIR=/tmp
export HADOOP_PID_DIR=/home/hadoop/hadoop-full/pids
执行保存: Esc :wq
  • 根据官网描述,需要安装ssh和pdsh(分布式运维工具,对于集群节点批量执行命令的工具)

pdsh下载页,下载pdsh-2.29.tar.bz2。题外话,最新的版本只找到rpm版本,但安装依赖太多,弄了一会儿,考虑到差异不会太大,选择放弃。。。链接地址

# 验证是否安装,无信息则需安装
[root@hadoop-master /soft]# ssh -V
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
[root@hadoop-master /soft]# pdsh -V
pdsh-2.29
rcmd modules: ssh,rsh,exec (default: rsh)
misc modules: machines,dshgroup
# 安装部分,一般ssh都会存在
[root@hadoop-master /soft]# sudo yum -y install ssh
[root@hadoop-master /root/download]# tar jxvf pdsh-2.29.tar.bz2
[root@hadoop-master /root/download]# cd pdsh-2.29
[root@hadoop-master /root/download/pdsh-2.29]# ./configure --with-ssh --with-rsh --with-mrsh --with-mqshell --with-qshell --with-dshgroups --with-machines=/etc/pdsh/machines --without-pam
[root@hadoop-master /root/download/pdsh-2.29]# make;make install
# 将待批处理执行的主机名写入文件中,利用pdsh批量执行
[root@hadoop-master /root/download/pdsh-2.29]# mkdir /etc/pdsh/
[root@hadoop-master /root/download/pdsh-2.29]# vi /etc/pdsh/machines
hadoop-master-standby1
hadoop-dn1
hadoop-dn2
hadoop-dn3
执行保存: Esc :wq
[root@hadoop-master /root/download/pdsh-2.299]# chmod 777 /etc/pdsh/machines
[root@hadoop-master /root/download/pdsh-2.299]# pdsh -R ssh -a "uptime"
# 可选择性删除安装文件
# [root@hadoop-master /root/download]# rm -f pdsh-2.29.tar.bz2
# [root@hadoop-master /root/download]# rm -fr pdsh-2.299

确认/etc/hosts的各机器的域名映射列表,7台机器有同样的信息。使用root用户

[root@hadoop-master /root]# cat /etc/hosts
192.168.146.3 hadoop-master
192.168.146.4 hadoop-master-standby1
192.168.146.5 hadoop-master2
192.168.146.6 hadoop-master2-standby1
192.168.146.7 hadoop-dn1
192.168.146.8 hadoop-dn2
192.168.146.9 hadoop-dn3
# hosts文件拷贝到其它6台机器
[root@hadoop-master /root]# scp /etc/hosts root@hadoop-master-standby1:/etc
[root@hadoop-master /root]# scp /etc/hosts root@hadoop-master2:/etc
[root@hadoop-master /root]# scp /etc/hosts root@hadoop-master2-standby1:/etc
[root@hadoop-master /root]# scp /etc/hosts root@hadoop-dn1:/etc
[root@hadoop-master /root]# scp /etc/hosts root@hadoop-dn2:/etc
[root@hadoop-master /root]# scp /etc/hosts root@hadoop-dn3:/etc

hadoop-master-standby1、hadoop-master2、hadoop-master2-standby1等3台各生成id_rsa.pub(hadoop-master已生成),并拷贝到hadoop-master汇总成authorized_keys。再分发到7台机器中,实现NameNode免密码登录DataNode。

# hadoop-master2机器
[hadoop@hadoop-master2 /home/hadoop]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[hadoop@hadoop-master2 /home/hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-master:/home/hadoop
# 汇总到的hadoop-master机器
[hadoop@hadoop-master /home/hadoop]$ cat /home/hadoop/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
# hadoop-master-standby1机器
[hadoop@hadoop-master-standby1 /home/hadoop]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[hadoop@hadoop-master-standby1 /home/hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-master:/home/hadoop
# 汇总到的hadoop-master机器
[hadoop@hadoop-master /home/hadoop]$ cat /home/hadoop/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
# hadoop-master2-standby1机器
[hadoop@hadoop-master2-standby1 /home/hadoop]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[hadoop@hadoop-master2-standby1 /home/hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-master:/home/hadoop
# 汇总到的hadoop-master机器
[hadoop@hadoop-master /home/hadoop]$ cat /home/hadoop/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys

#查看最终的内容
[hadoop@hadoop-master /home/hadoop]$ cat /home/hadoop/.ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2wNkYLbNt/hxoPK4P1WuPt94bF69dY4qM0/dFsOzZdIiMErxB0I7Q7O54qwKCQJRDV2bVZoG9IahSUYddPHbsIiaViZpKs3REiPQUvr7DFro5S2/4fHo5zI69CNaLe8j1a/lgrt+Kad5t2ekEZpbpmWbxrcfKIZfkfQFrVhkdZp+LW8coOpiK2w1tmGbnvrsLr4PvLDPDiVl4RLVaG47wBnHVuWa43IrFoEJUL6S1e7Hb/0xNdwawUXGAQVPTvKqVE6OD/eD1sFO4R1jZCx0arM+9CNoUPxV4ecG3WU6hsbVupE/B1wugxDnZT6tVn7DpGSOwV9PFwu3kaOaG4D1x hadoop@hadoop-master
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAM4Yxi5lpfsCCBP846sFRyD6jRk2x6wrhK2xa4xhTmk0LzgveFtMQ5u3HWORq4HPt/dLtvlEhYg1NvykTyom170OC4w+BX1OG57z1p+QuAhF6yywpmwHrP5/kZXTGpfrIZjs07HVF0jkjNOEzap/b6busWgaBHAwiSUZIWPkIlF+CU2wmq+fpvdxT3yGM4vCEocjKWobDWC0j9phaC7jDml/icOv2HM9Yf/uYv4qpY7StUl75xFi44qzGReHboBSTaaLN+ixX5riDocHeGfSpLe4K1QUHneypGdDF0JjC89RZJCzesngggz/mKVu/LaKQDAgVPjZXveR/TvsC9jVd hadoop@hadoop-master2
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCttADevEtHVDfKpmbikWbgjCytOy63NSCHhNqguDXWp2XhQ+ORYOYGRx0jUaO5JRpayYAEQ+eaDcn+HG+I7HqH9try/ySTTokFIXttwkfOF6229QmMtQzRhHp3F4JAcwgaeaLa/zmRScACwZjPhMpY9jIfZF55uscFr9g7Svxdbh1DQAWHmas5MqXOPaGLk6phq+Utwg4ySPUJuzdKwNfBEOuPPYIu7d75jNePKfdOSjK2c+7JWgi/M3NDrJjG8ACi3pBNiTs1h27A3su0o04xONrJ1shc8ztNSlQZTAPvQFJCAwcjXsS1TpBIvNyY/ZpCb8ToyGi/2+svlBgf92Y7 hadoop@hadoop-master-standby1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvt5yeK6hnQR4WAnbZCB2M/nRdoD00Ew+hExQqsAeBU2IKUo0/ZLlhkMpxl2R5aIVfe+FyTPyiv45dRhSRnSHgngdjN5GHCaALhu1Av4InRPGraQvIcPaT5A9QUtul1bQqIM7wT7FTnnCI66oUdlN2uXSlwEX4e+MvO8FLfr+937aql8rBQfrVmAgjO5SWIVvzgfDNofnF5KhfLcO9Mj76dK3p+OxbDCcr9O1f2ehc7i775kKo0/QyRI5PWs49+BlkJqxRahWPQX/WpX/klaRWMWy+1vQwjaqsdJIwVTT0bOecpQgXcrtZxZzLIUeX8GX0DzNmJEVpw5ltgBjBxJGB hadoop@hadoop-master2-standby1
# 拷贝到其它6台机器
[hadoop@hadoop-master /home/hadoop]$ scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-master2:/home/hadoop/.ssh
[hadoop@hadoop-master /home/hadoop]$ scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-master-standby1:/home/hadoop/.ssh
[hadoop@hadoop-master /home/hadoop]$ scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-master2-standby1:/home/hadoop/.ssh
[hadoop@hadoop-master /home/hadoop]$ scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-dn1:/home/hadoop/.ssh
[hadoop@hadoop-master /home/hadoop]$ scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-dn2:/home/hadoop/.ssh
[hadoop@hadoop-master /home/hadoop]$ scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-dn3:/home/hadoop/.ssh

以下两个脚本来自徐培成老师

  • 远程执行命令
[hadoop@hadoop-master /home/hadoop]$ sudo vi /usr/local/bin/xcall.sh
#!/bin/bash
if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi

params=$@

echo ================ master-standby1  ==================
ssh hadoop-master-standby1 "$params"
echo ================ master2  ==================
ssh hadoop-master2 "$params"
echo ================ master2-standby1  ==================
ssh hadoop-master2-standby1 "$params"

i=1
for (( i=1 ; i<=3 ; i=$i + 1  )) ; do
    echo ================ dn$i ==================
    ssh hadoop-dn$i "$params"
done
执行保存: Esc :wq
[hadoop@hadoop-master /home/hadoop]$ sudo chmod a+x /usr/local/bin/xcall.sh
  • 拷贝文件到远程机器命令
[hadoop@hadoop-master /home/hadoop]$ sudo vi /usr/local/bin/xrsync.sh
#!/bin/bash
if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi

p=$1

#echo p=$p

dir=`sudo dirname $p`
#echo dir=$dir

filename=`sudo basename $p`
#echo filename=$filename

cd -P $dir
fullpath=`pwd -P .`

user=`whoami`

echo ================ hadoop-master-standby1  ==================
rsync -lr $p $user@hadoop-master-standby1:$fullpath
echo ================ hadoop-master2  ==================
rsync -lr $p $user@hadoop-master2:$fullpath
echo ================ hadoop-master2-standby1  ==================
rsync -lr $p $user@hadoop-master2-standby1:$fullpath
i=1
for (( i=1 ; i<=3 ; i=$i + 1  )) ; do
    echo ================ dn$i  ==================
    rsync -lr $p $user@hadoop-dn$i:$fullpath
done
执行保存: Esc :wq
[hadoop@hadoop-master /home/hadoop]$ sudo chmod a+x /usr/local/bin/xrsync.sh

每台服务器增加全局jps命令。前提条件是都已在/soft/jdk目录安装java环境

[hadoop@hadoop-master /home/hadoop]$ xcall.sh "sudo ln -sfT /soft/jdk/bin/jps /usr/local/bin/jps"
# 使用
[hadoop@hadoop-master /home/hadoop]$ xcall.sh jps
================ master-standby1 ==================
2121 Jps
================ master2 ==================
26868 Jps
================ master2-standby1 ==================
2125 Jps
================ dn1 ==================
2263 Jps
================ dn2 ==================
2024 Jps
================ dn3 ==================
2103 Jps

往6台服务器拷贝/etc/profile文件,使用root用户

[root@hadoop-master /root]# xrsync.sh /etc/profile
[root@hadoop-master /root]# xcall.sh source /etc/profile

往6台服务器拷贝hadoop文件夹。今后使用此脚本,为了防止配置文件被修改,请自行修改

[hadoop@hadoop-master /home/hadoop]$ xcall.sh sudo mkdir /soft
[hadoop@hadoop-master /home/hadoop]$ xcall.sh sudo chown -R hadoop:hadoop /soft
[hadoop@hadoop-master /home/hadoop]$ xrsync.sh /soft/hadoop-3.2.1
[hadoop@hadoop-master /home/hadoop]$ xrsync.sh /soft/hadoop

以下需修改hadoop-master2和master2-standby1机器上的配置文件:core-site.xml、yarn-site.xml和mapred-site.xml。

core-site.xml

[hadoop@hadoop-master2 /home/hadoop]$ sed -i 's/nns1/nns2/' /soft/hadoop/etc/hadoop/core-site.xml
[hadoop@hadoop-master2-standby1 /home/hadoop]$ sed -i 's/nns1/nns2/' /soft/hadoop/etc/hadoop/core-site.xml

hdfs-site.xml

[hadoop@hadoop-master2 /home/hadoop]$ sed -i 's/nns1-ha-data/nns2-ha-data/' /soft/hadoop/etc/hadoop/hdfs-site.xml
[hadoop@hadoop-master2-standby1 /home/hadoop]$ sed -i 's/nns1-ha-data/nns2-ha-data/' /soft/hadoop/etc/hadoop/hdfs-site.xml

yarn-site.xml

[hadoop@hadoop-master2 /home/hadoop]$ sed -i 's/yarn1-ha/yarn2-ha/' /soft/hadoop/etc/hadoop/yarn-site.xml 
[hadoop@hadoop-master2 /home/hadoop]$ sed -i 's/hadoop-master/hadoop-master2/' /soft/hadoop/etc/hadoop/yarn-site.xml 
[hadoop@hadoop-master2-standby1 /home/hadoop]$ sed -i 's/yarn1-ha/yarn2-ha/' /soft/hadoop/etc/hadoop/yarn-site.xml 
[hadoop@hadoop-master2-standby1 /home/hadoop]$ sed -i 's/hadoop-master/hadoop-master2/' /soft/hadoop/etc/hadoop/yarn-site.xml 
# 只修改一台DataNode节点连接到hadoop-master2的Yarn,其它两个连接到hadoop-master的Yarn
[hadoop@hadoop-dn3 /home/hadoop]$ sed -i 's/yarn1-ha/yarn2-ha/' /soft/hadoop/etc/hadoop/yarn-site.xml 
[hadoop@hadoop-dn3 /home/hadoop]$ sed -i 's/hadoop-master/hadoop-master2/' /soft/hadoop/etc/hadoop/yarn-site.xml 

mapred-site.xml

[hadoop@hadoop-master2 /home/hadoop]$ sed -i 's/hadoop-master/hadoop-master2/' /soft/hadoop/etc/hadoop/mapred-site.xml
[hadoop@hadoop-master2-standby1 /home/hadoop]$ sed -i 's/hadoop-master/hadoop-master2/' /soft/hadoop/etc/hadoop/mapred-site.xml

好了,就到这吧!!能看到这里的朋友,说明你真的很努力,开始启动测试。。。

在两台NameNode服务器上,初始化在ZooKeeper中的HA状态。在zookeeper根目录生成/hadoop-ha/nns1,/hadoop-ha/nns2。

[hadoop@hadoop-master /home/hadoop]$ hdfs zkfc -formatZK
[hadoop@hadoop-master2 /home/hadoop]$ hdfs zkfc -formatZK
# 连上zookeeper看看,zookeeper安装在hadoop-dn1机器
[hadoop@hadoop-dn1 /soft]$ zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /
[hadoop-ha]
# 两个集群ID
[zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha
[nns1, nns2]
[zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha/nns1
[]
[zk: localhost:2181(CONNECTED) 3] quit

首次启动,先在3台dn数据节点机器上,启动Quorum Journal Node 数据同步服务(依据dfs.namenode.shared.edits.dir配置)

非首次的启动顺序:NameNode,DataNode,JournalNode,ZK Failover Controllers(参考sbin/start-dfs.sh 脚本)

[hadoop@hadoop-dn1 /home/hadoop]$ hdfs --daemon start journalnode
[hadoop@hadoop-dn2 /home/hadoop]$ hdfs --daemon start journalnode
[hadoop@hadoop-dn3 /home/hadoop]$ hdfs --daemon start journalnode

# 停止的命令
# hdfs --daemon stop journalnode

初始化NameNode节点,会调用上面的服务。

# 重点,指定两台使用同样的集群ID,可共享DataNode节点
[hadoop@hadoop-master /home/hadoop]$ hdfs namenode -format -clusterId ferderation
[hadoop@hadoop-master2 /home/hadoop]$ hdfs namenode -format -clusterId ferderation
# 可在任一数据节点查看文件创建情况
[hadoop@hadoop-dn1 /home/hadoop/hadoop-full/tmp/dfs/journal]$ ll
总用量 0
drwxrwxr-x. 4 hadoop hadoop 58 1月  24 16:13 nns1-ha-data
drwxrwxr-x. 3 hadoop hadoop 40 1月  24 16:40 nns2-ha-data

启动两台NameNode Active 和 Standby

[hadoop@hadoop-master /home/hadoop]$ hdfs --daemon start namenode
# 元数据的同步命令
[hadoop@hadoop-master-standby1 /home/hadoop]$ hdfs namenode -bootstrapStandby
[hadoop@hadoop-master-standby1 /home/hadoop]$ hdfs --daemon start namenode

[hadoop@hadoop-master2 /home/hadoop]$ hdfs --daemon start namenode
[hadoop@hadoop-master2-standby1 /home/hadoop]$ hdfs namenode -bootstrapStandby
[hadoop@hadoop-master2-standby1 /home/hadoop]$ hdfs --daemon start namenode

# 停止的命令
# hdfs --daemon stop namenode

在3台dn数据节点机器上,启动DataNode(依据workers)

[hadoop@hadoop-dn1 /home/hadoop]$ hdfs --daemon start datanode
[hadoop@hadoop-dn2 /home/hadoop]$ hdfs --daemon start datanode
[hadoop@hadoop-dn3 /home/hadoop]$ hdfs --daemon start datanode

# 停止的命令
# hdfs --daemon stop datanode

启动ZK Failover Controllers,故障切换检测进程

[hadoop@hadoop-master /home/hadoop]$ hdfs --daemon start zkfc
[hadoop@hadoop-master-standby1 /home/hadoop]$ hdfs --daemon start zkfc
[hadoop@hadoop-master2 /home/hadoop]$ hdfs --daemon start zkfc
[hadoop@hadoop-master2-standby1 /home/hadoop]$ hdfs --daemon start zkfc

# 停止的命令
# hdfs --daemon stop zkfc

查看webui

http://hadoop-master:50070
http://hadoop-master-standby1:50070
http://hadoop-master2:50070
http://hadoop-master2-standby1:50070

指令查看NameNode状态

[hadoop@hadoop-master /home/hadoop]$ hdfs haadmin -ns nns1 -getServiceState nna1
active
[hadoop@hadoop-master /home/hadoop]$ hdfs haadmin -ns nns1 -getServiceState nna2
standby
[hadoop@hadoop-master /home/hadoop]$ hdfs haadmin -ns nns2 -getServiceState nnb1
active
[hadoop@hadoop-master /home/hadoop]$ hdfs haadmin -ns nns2 -getServiceState nnb2
standby

手动切换状态

[hadoop@hadoop-master /home/hadoop]$ hdfs haadmin -ns nns1 -failover  nna1 nna2
# nna1 nna2 反过来效果不同

批量启动命令

[hadoop@hadoop-master /home/hadoop]$ start-dfs.sh 
Starting namenodes on [hadoop-master hadoop-master-standby1 hadoop-master2 hadoop-master2-standby1]
Starting datanodes
Starting journal nodes [hadoop-dn3 hadoop-dn2 hadoop-dn1]
Starting ZK Failover Controllers on NN hosts [hadoop-master hadoop-master-standby1 hadoop-master2 hadoop-master2-standby1]
hadoop-master: zkfc is running as process 3488.  Stop it first.
hadoop-master-standby1: zkfc is running as process 2817.  Stop it first.
hadoop-master2-standby1: zkfc is running as process 2983.  Stop it first.
hadoop-master2: zkfc is running as process 88144.  Stop it first.
# 停止的命令
# stop-dfs.sh 

启动Yarn框架。Yarn Federation 暂时无实现,Yarn HA管理上千上万个NM节点还早,待后续使用熟练后再练习

resourceManager启动依据resourcemanager.hostnamep配置项;nodemanager启动依据workers。

# resourcemanager
[hadoop@hadoop-master /home/hadoop]$ yarn --daemon start resourcemanager
[hadoop@hadoop-master-standby1 /home/hadoop]$ yarn --daemon start resourcemanager
[hadoop@hadoop-master2 /home/hadoop]$ yarn --daemon start resourcemanager
[hadoop@hadoop-master2-standby1 /home/hadoop]$ yarn --daemon start resourcemanager
# nodemanager
[hadoop@hadoop-dn1 /home/hadoop]$ yarn-daemon.sh start nodemanager
[hadoop@hadoop-dn2 /home/hadoop]$ yarn-daemon.sh start nodemanager
[hadoop@hadoop-dn3 /home/hadoop]$ yarn-daemon.sh start nodemanager

# 停止的命令
# yarn --daemon stop resourcemanager
# yarn --daemon stop nodemanager

手动切换状态

[hadoop@hadoop-master /home/hadoop]$ yarn rmadmin -getServiceState rm1
[hadoop@hadoop-master /home/hadoop]$ yarn rmadmin -getServiceState rm2

批量启动命令

[hadoop@hadoop-master /home/hadoop]$ start-yarn.sh 
Starting resourcemanagers on [ hadoop-master hadoop-master-standby1]
Starting nodemanagers
[hadoop@hadoop-master2 /home/hadoop]$ start-yarn.sh
Starting resourcemanagers on [ hadoop-master2 hadoop-master2-standby1]
Starting nodemanagers
# 停止的命令
# stop-yarn.sh

查看WebUI:

http://hadoop-master:8088
http://hadoop-master2:8088

启动MR-jobhistory

[hadoop@hadoop-master-standby1 /home/hadoop]$ mapred --daemon start historyserver
[hadoop@hadoop-master2-standby1 /home/hadoop]$ mapred --daemon start historyserver

查看WebUI:

http://hadoop-master-standby1:19888
http://hadoop-master2-standby1:19888

五.自动启服务

使用root用户。由于hadoop各项服务都已自带启动和关闭脚本,那么就以其中一个NameNode服务器来讲解即可。

[root@hadoop-master /root]# vi /etc/systemd/system/hdfs-namenode.service
[Unit]
Description=hdfs-namenode
After=syslog.target network.target

[Service]
Type=forking
User=hadoop
Group=hadoop

ExecStart=/soft/hadoop/bin/hdfs --daemon start namenode
ExecStop=/soft/hadoop/bin/hdfs --daemon stop namenode

[Install]
WantedBy=multi-user.target
执行保存: Esc :wq
[root@hadoop-master /root]# chmod 755 /etc/systemd/system/hdfs-namenode.service
[root@hadoop-master /root]# systemctl enable hdfs-namenode
[root@hadoop-master /root]# service hdfs-namenode start

附录:

官方文档:

资料检索: