Hadoop3.1.3搭建集群

220 阅读7分钟

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第2天,点击查看活动详情

  •  官网链接

Index of /dist/hadoop/common/hadoop-3.1.3

这里我下载的是3.1.3.tar.gz。

节点分配:

首先,对于HDFS来说,NameNode和SecondaryNameNode是肯定要放在不同节点上的,其次,对于YARN中的ResourceManager来说,由于它也比较消耗资源,所以也尽量和NameNode/SecondaryNameNode分开放在不同节点上。

HostnameIPRole
hadoop301192.168.126.134NameNodeDataNodeNodeManager
hadoop302192.168.126.135ResourceManagerDataNodeNodeManager
hadoop303192.168.126.136SecondaryNameNodeDataNodeNodeManager

先以hadoop301为例进行如下配置:

  • 安装jdk
[root@hadoop301 etc]# cd /usr/local/wyh/software/java/;ll
total 143360
-rw-r--r--. 1 root root 146799982 Nov 30  2021 jdk-8u311-linux-x64.tar.gz
[root@hadoop301 java]# tar -zxvf jdk-8u311-linux-x64.tar.gz

配置环境变量:

以往我们添加环境变量时,是直接在/etc/profile文件中修改的,但是这次我们尝试用另一种方式来配置环境变量。因为我们认真阅读/etc/profile中的代码可以发现如下这段代码中会遍历/etc/profile.d下面的所有.sh文件执行循环体中的操作,使各个sh文件中的环境变量全局生效,所以我们可以直接在/etc/profile.d目录下创建一个sh文件去单独配置我们的环境变量。这种做法是目前在企业中较为常用的。

[root@hadoop301 java]# cd /etc/profile.d/;ll
total 56
-rw-r--r--. 1 root root  771 Oct 13  2020 256term.csh
-rw-r--r--. 1 root root  841 Oct 13  2020 256term.sh
-rw-r--r--. 1 root root  196 Mar 24  2017 colorgrep.csh
-rw-r--r--. 1 root root  201 Mar 24  2017 colorgrep.sh
-rw-r--r--. 1 root root 1741 Aug  6  2019 colorls.csh
-rw-r--r--. 1 root root 1606 Aug  6  2019 colorls.sh
-rw-r--r--. 1 root root   80 Apr  1  2020 csh.local
-rw-r--r--. 1 root root 1706 Oct 13  2020 lang.csh
-rw-r--r--. 1 root root 2703 Oct 13  2020 lang.sh
-rw-r--r--. 1 root root  123 Jul 30  2015 less.csh
-rw-r--r--. 1 root root  121 Jul 30  2015 less.sh
-rw-r--r--. 1 root root   81 Apr  1  2020 sh.local
-rw-r--r--. 1 root root  164 Jan 27  2014 which2.csh
-rw-r--r--. 1 root root  169 Jan 27  2014 which2.sh
[root@hadoop301 profile.d]# vi wyh_env.sh
[root@hadoop301 profile.d]# cat wyh_env.sh
#JAVA_HOOME
export JAVA_HOME=/usr/local/wyh/software/java/jdk1.8.0_311
export PATH=$PATH:$JAVA_HOME/bin

使配置生效并验证java:

[root@hadoop301 profile.d]# source /etc/profile
[root@hadoop301 profile.d]# java -version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.311-b11, mixed mode)
  • 安装hadoop
[root@hadoop301 profile.d]# cd /usr/local/wyh/software/;ll
total 330156
-rw-r--r--. 1 root root 338075860 Jul  1 07:09 hadoop-3.1.3.tar.gz
drwxr-xr-x. 3 root root        26 Jul  1 06:37 java
[root@hadoop301 software]# tar -zxvf hadoop-3.1.3.tar.gz

在刚才创建的wyh_env.sh中添加如下配置:

#HADOOP_HOME
export HADOOP_HOME=/usr/local/wyh/software/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

使配置生效:

[root@hadoop301 profile.d]# source /etc/profile

验证hadoop是否安装成功:

然后将hadoop301上的配置同步复制到hadoop302和hadoop303。具体操作如下:

  1. 在hadoop302和hadoop303上创建同级目录:
[root@hadoop302 ~]# mkdir -p /usr/local/wyh/software
[root@hadoop302 ~]# cd /usr/local/wyh/software;ll
total 0
[root@hadoop302 software]# pwd
/usr/local/wyh/software
[root@hadoop303 ~]# mkdir -p /usr/local/wyh/software
[root@hadoop303 ~]# cd /usr/local/wyh/software;ll
total 0
[root@hadoop303 software]# pwd
/usr/local/wyh/software

2.在hadoop301上行scp命令,将目录复制到hadoop302和hadoop303

[root@hadoop301 software]# scp -r ./java root@hadoop302:/usr/local/wyh/software/
[root@hadoop301 software]# scp -r ./java root@hadoop303:/usr/local/wyh/software/
[root@hadoop301 software]# scp -r ./hadoop-3.1.3/ root@hadoop302:/usr/local/wyh/software/
[root@hadoop301 software]# scp -r ./hadoop-3.1.3/ root@hadoop303:/usr/local/wyh/software/
[root@hadoop301 profile.d]# pwd
/etc/profile.d
[root@hadoop301 profile.d]# scp ./wyh_env.sh root@hadoop302:/etc/profile.d/
[root@hadoop301 profile.d]# scp ./wyh_env.sh root@hadoop303:/etc/profile.d/

3.在hadoop302和hadoop303上执行命令使环境变量生效

[root@hadoop302 profile.d]# source /etc/profile
[root@hadoop303 profile.d]# source /etc/profile
  • 在三台节点之间配置免密登录
#在hadoop301上生成公钥和私钥,并将公钥分发给三台机器,以实现hadoop301免密登陆hadoop301/hadoop302/hadoop303

[root@hadoop301 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:N3vWyIILjPlUcTrhzGvfCdMDCstHW1fu2G9+feNim2k root@hadoop301
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|        o .   .  |
|       + =   o   |
|      . S = . .  |
|     = = O B *   |
|    o * B = O + .|
|     o + o B Eoo=|
|      . . . =+++*|
+----[SHA256]-----+
#执行过程中的输出内容只需敲回车即可
[root@hadoop301 .ssh]# ll
total 12
-rw-------. 1 root root 1679 Jul  3 08:37 id_rsa
-rw-r--r--. 1 root root  396 Jul  3 08:37 id_rsa.pub
-rw-r--r--. 1 root root  374 Jul  3 07:51 known_hosts

[root@hadoop301 .ssh]# ssh-copy-id hadoop302
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop302's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop302'"
and check to make sure that only the key(s) you wanted were added.

[root@hadoop301 .ssh]# ssh-copy-id hadoop303
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop303's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop303'"
and check to make sure that only the key(s) you wanted were added.

[root@hadoop301 .ssh]# ssh-copy-id hadoop301
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'hadoop301 (192.168.126.134)' can't be established.
ECDSA key fingerprint is SHA256:LupxuZ6Xu4rPMGTK+00tbktYzP9ERgRcFDEI1fWYkfE.
ECDSA key fingerprint is MD5:82:e4:c6:a8:2d:ae:f0:a2:2e:d5:9e:c4:67:4c:b3:f7.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop301's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop301'"
and check to make sure that only the key(s) you wanted were added.
#在hadoop302上生成公钥和私钥,并将公钥分发给三台机器
[root@hadoop302 .ssh]# ssh-keygen -t rsa
[root@hadoop302 .ssh]# ssh-copy-id hadoop301
[root@hadoop302 .ssh]# ssh-copy-id hadoop302
[root@hadoop302 .ssh]# ssh-copy-id hadoop303
#在hadoop303上生成公钥和私钥,并将公钥分发给三台机器
[root@hadoop303 ~]# ssh-keygen -t rsa
[root@hadoop303 ~]# ssh-copy-id hadoop301
[root@hadoop303 ~]# ssh-copy-id hadoop302
[root@hadoop303 ~]# ssh-copy-id hadoop303
  • 集群配置

先在hadoop301上修改下main四个配置文件:

[root@hadoop301 hadoop]# pwd
/usr/local/wyh/software/hadoop-3.1.3/etc/hadoop

修改core-site.xml:

<configuration>
        <!-- 指定NameNode内部通信地址 -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop301:8020</value>
        </property>
        <!-- 指定hadoop数据的存储目录,如果该目录不存在,它会自动帮我们创建 -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/wyh/software/hadoop-3.1.3/data</value>
        </property>
</configuration>

修改hdfs-site.xml:

<configuration>
        <!-- NameNode对用户暴露的访问地址,也即NameNode web访问地址 -->
        <property>
                <name>dfs.namenode.http-address</name>
                <value>hadoop301:9870</value>
        </property>
        <!-- SecondaryNameNode web访问地址 -->
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>hadoop303:9868</value>
        </property>
</configuration>

修改yarn-site.xml:

<configuration>
        <!-- 指定MapReduce使用shuffle -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <!-- 指定ResourceManager的地址 -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop302</value>
        </property>
        <!-- 重写环境变量的继承,这是hadoop3.1.3中的bug,但在3.2.x中已经修改了 -->
        <property>
                <name>yarn.nodemanager.env-whitelist</name>
                <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
        </property>
<!-- Site specific YARN configuration properties -->

</configuration>

修改mapred-site.xml:

<configuration>
        <!-- 指定MapReduce程序运行在YARN上 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

由于修改了上述四个配置文件,所以我们可以直接将该目录下的内容全部分发到另外两台节点上,实现配置同步:

[root@hadoop301 etc]# scp -r hadoop/ root@hadoop302:$PWD
[root@hadoop301 etc]# scp -r hadoop/ root@hadoop303:$PWD
  • 配置workers(在hadoop2.x中是slaves文件)并分发到另外两台节点
#将原来默认的localhost删掉,并添加集群中的三台节点的hostname
[root@hadoop301 hadoop]# cat workers
hadoop301
hadoop302
hadoop303

#分发配置
[root@hadoop301 hadoop]# cd ../
[root@hadoop301 etc]# scp -r hadoop/ root@hadoop302:$PWD
[root@hadoop301 etc]# scp -r hadoop/ root@hadoop303:$PWD
  • 配置启动脚本
[root@hadoop301 sbin]# pwd
/usr/local/wyh/software/hadoop-3.1.3/sbin
[root@hadoop301 sbin]#
[root@hadoop301 sbin]#
[root@hadoop301 sbin]# vi start-yarn.sh
[root@hadoop301 sbin]# vi stop-yarn.sh
#对上面两个文件进行修改,在文件头部添加如下配置
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

[root@hadoop301 sbin]# vi start-dfs.sh
[root@hadoop301 sbin]# vi stop-dfs.sh
#对上面两个文件头部添加如下配置
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

这里我配置的时候是分别在三台节点上手动配置的,也可以使用scp实现分发
  • 配置历史服务器

在mapred-site.xml中添加如下配置并分发同步到另外两台节点上:

        <!-- 历史服务器内部通信地址 -->
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>hadoop301:10020</value>
        </property>
        <!-- 历史服务器web端地址 -->
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>hadoop301:19888</value>
        </property>
[root@hadoop301 hadoop]# scp  mapred-site.xml root@hadoop302:$PWD
[root@hadoop301 hadoop]# scp  mapred-site.xml root@hadoop303:$PWD
  • 开启日志聚集功能

该功能可以将各个服务器上的日志统一聚集到一台机器上并对外提供端口供用户访问日志。

在yarn-site.xml中添加如下配置并分发给另外两台节点:

        <!-- 开启日志聚集功能 -->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!-- 设置日志聚集服务器 -->
        <property>
                <name>yarn.log.server.url</name>
                <value>http://hadoop301:19888/jobhistory/logs</value>
        </property>
        <!-- 设置日志保留时间为7天 -->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
[root@hadoop301 hadoop]# scp yarn-site.xml root@hadoop302:$PWD
[root@hadoop301 hadoop]# scp yarn-site.xml root@hadoop303:$PWD
  • 格式化集群

注意,只有在第一次启动集群时才需要格式化。

[root@hadoop301 hadoop-3.1.3]# hdfs namenode -format

 初始化之后可以看到生成了data目录和logs目录:

  •  启动集群

在NameNode所在节点上启动hdfs:

[root@hadoop301 sbin]# ./start-dfs.sh

查看启动后的进程:

 

 

 启动成功后,可以访问NameNode的web界面(访问地址就是刚才我们在hdfs-site.xml中配置的):

http://hadoop301:9870/

在ResourceManager所在节点上启动yarn:

[root@hadoop302 sbin]# ./start-yarn.sh

查看进程:

 

 

 访问ResoueceManager的web界面:

http://hadoop302:8088/

  • 启动历史服务器

由于mapred-site.xml中配置的是hadoop301作为历史服务器,所以这里在hadoop301上启动:

[root@hadoop301 bin]# pwd
/usr/local/wyh/software/hadoop-3.1.3/bin
[root@hadoop301 bin]# ./mapred --daemon start historyserver


#如果想要停掉历史服务器,可以使用如下命令
[root@hadoop301 bin]# ./mapred --daemon stop historyserver

  • 测试
#创建测试数据文件
[root@hadoop301 hadoop-3.1.3]# cat test0703.txt
hello,hadoop3
#在HDFS上新建目录并将数据文件上传至HDFS
[root@hadoop301 hadoop-3.1.3]# hadoop fs -mkdir /test_input
[root@hadoop301 hadoop-3.1.3]# hadoop fs -put test0703.txt /test_input

测试word count示例程序:

[root@hadoop301 hadoop-3.1.3]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /test_input /test_output

#/test_output是不存在的,自动创建

运行成功后:

查看HDFS:

查看ResourceManager:

 查看JobHistory:

 查看日志:

至此,hadoop3.1.3基本搭建完成。