前面和3.0一样,创建用户,免密登录,从解压,安装开始
1、上传hadoop安装包,进行解压
#1.创建hadoop安装目录
[root@node1 ~]# mkdir -p /opt/bigdata
#2.解压hadoop-2.7.3.tar.gz
[root@node1 ~]# tar -xzvf hadoop-2.7.3.tar.gz -C /opt/bigdata/
2、配置hadoop环境变量
[hadoop@node1 ~]# vi .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
JAVA_HOME=/usr/java/jdk1.8.0_211-amd64
HADOOP_HOME=/opt/bigdata/hadoop-2.7.3
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export HADOOP_HOME
export PATH
~
:wq!
#验证环境变量
[hadoop@node1 ~]# source .bash_profile
#2.显示hadoop的版本信息
[hadoop@node1 ~]# hadoop version
#3.显示出hadoop版本信息表示安装和环境变量成功.
Hadoop 2.7.3
Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a
Compiled by sunilg on 2019-01-29T01:39Z
Compiled with protoc 2.5.0
From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9
This command was run using /opt/bigdata/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
[hadoop@node1 ~]#
3、配置hadoop-env.sh,yarn-env.sh
修改hadoop-env.sh,yarn-env.sh,这个文件只需要配置JAVA_HOME的值即可,在文件中找到export JAVA_HOME字眼的位置,删除最前面的#
export JAVA_HOME=/usr/java/jdk1.8.0_211-amd64
#hadoop-env.sh文件配置
[root@node1 ~]# cd /opt/bigdata/hadoop-2.7.3/etc/hadoop/
You have new mail in /var/spool/mail/root
[root@node1 hadoop]# pwd
/opt/bigdata/hadoop-2.7.3/etc/hadoop
[root@node1 hadoop]# vi hadoop-env.sh
#yarn-env.sh文件配置
[root@node1 ~]# cd /opt/bigdata/hadoop-2.7.3/etc/hadoop/
You have new mail in /var/spool/mail/root
[root@node1 hadoop]# pwd
/opt/bigdata/hadoop-2.7.3/etc/hadoop
[root@node1 hadoop]# vi yarn-env.sh
注:如果进入编辑文件,编辑完无法保存,查看下是否是权限没有写入权限,参照10配置修改hadoop安装目录的权限
4、配置core-site.xml
[root@node1 ~]# cd /opt/bigdata/hadoop-2.7.3/etc/hadoop/
<configuration>
<!-- 指定hdfs的namenode主机的hostname -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<!--hadoop集群临时数据存储目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/bigdata/hadoop-2.7.3/tmpdata</value>
</property>
</configuration>
5、配置hdfs-site.xml
<configuration>
<!--namenode元数据存储目录-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/bigdata/hadoop-2.7.3/hadoop/dfs/name/</value>
</property>
<!--工作节点的数据块存储目录 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/bigdata/hadoop-2.7.3/hadoop/hdfs/data/</value>
</property>
<!--block的副本数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
6、配置mapred-site.xml
在/opt/bigdata/hadoop-2.7.3/etc/hadoop路径下mapred-site.xml本身不存在的,只有一个mapred-site.xml.template模板文件,这时需要把mapred-site.xml.template模板文件复制一份配置文件出来作为配置文件,不建议直接在模板文件上修改,如果修改失败就会破坏模板文件,复制一份新的文件出来修改即可,失败了可以再次从模板文件复制一份.(注意这里和hadoop3不同的是mapred-site.xml文件本身已经存在)
#从模板文件复制一份配置文件处理
[root@node1 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@node1 hadoop]#
<!--指定运行mapreduce的环境是yarn -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
7、配置yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:18088</value>
</property>
8、编辑slaves
此文件用于配置集群有多少个数据节点,我们把node2,node3作为数据节点,node1作为集群管理节点.(和3.0配置workers一样)
[root@node1 hadoop]# vi slaves
#将localhost这一行删除掉
node2
node3
~
9、远程复制hadoop到集群机器
#1.进入到root用户家目录下
[root@node1 hadoop]# cd ~
#2.使用scp远程拷贝命令将root用户的环境变量配置文件复制到node2
[root@node1 ~]# scp .bash_profile root@node2:~
.bash_profile 100% 338 566.5KB/s 00:00
#3.使用scp远程拷贝命令将root用户的环境变量配置文件复制到node3
[root@node1 ~]# scp .bash_profile root@node3:~
.bash_profile 100% 338 212.6KB/s 00:00
[root@node1 ~]#
#4.进入到hadoop的share目录下
[root@node1 ~]# cd /opt/bigdata/hadoop-2.7.3/share/
You have new mail in /var/spool/mail/root
[root@node1 share]# ll
total 0
drwxr-xr-x 3 1001 1002 20 Jan 29 12:05 doc
drwxr-xr-x 8 1001 1002 88 Jan 29 11:36 hadoop
#5.删除doc目录,这个目录存放的是用户手册,比较大,等会儿下面进行远程复制的时候时间比较长,删除后节约复制时间
[root@node1 share]# rm -rf doc/
[root@node1 share]# cd ~
You have new mail in /var/spool/mail/root
[root@node1 ~]# scp -r /opt root@node2:/
[root@node1 ~]# scp -r /opt root@node3:/
#使集群所有机器环境变量生效
[hadoop@node2 hadoop-2.7.3]# cd ~
[hadoop@node2 ~]# source .bash_profile
[hadoop@node2 ~]# hadoop version
Hadoop 2.7.3
Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a
Compiled by sunilg on 2019-01-29T01:39Z
Compiled with protoc 2.5.0
From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9
This command was run using /opt/bigdata/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
[root@node2 ~]#
10、修改hadoop安装目录的权限,node2,node3也需要进行如下操作
#1.修改目录所属用户和组为hadoop:hadoop
[root@node1 ~]# chown -R hadoop:hadoop /opt/
You have new mail in /var/spool/mail/root
You have new mail in /var/spool/mail/root
#2.修改目录所属用户和组的权限值为755
[root@node1 ~]# chmod -R 755 /opt/
[root@node1 ~]# chmod -R g+w /opt/
[root@node1 ~]# chmod -R o+w /opt/
[root@node1 ~]#
11、格式化hadoop
#切换
[root@node1 ~]# su - hadoop
cd /opt/bigdata/hadoop-2.7.3/etc/hadoop/
[hadoop@node1 hadoop]$ hdfs namenode -format
12、启动集群
[hadoop@node1 ~]$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node1]
Starting resourcemanager
Starting nodemanagers
#使用jps显示java进程
[hadoop@node1 ~]$ jps
40852 ResourceManager
40294 NameNode
40615 SecondaryNameNode
41164 Jps
[hadoop@node1 ~]$
在浏览器地址栏中输入:[http://192.168.200.11:50070查看namenode的web界面.
hadoop2.x的namenode的端口是 50070
hadoop3.x的namenode的端口是 9870

13、停止集群
[hadoop@node1 ~]$ stop-all.sh