HAOOP全分布集群的搭建

31 阅读6分钟

一.操作系统redhat7.4

  1. 三台主机
主机名ip说明
master10.2.2.14主节点
slave110.2.2.15从节点
slave210.2.2.16从节点

2.三台主机都修改配置文件/etc/sysconfig/network-scripts/ifcfg-ens33文件:

BOOTPROTO="static"
ONBOOT="yes"
IPADDR=10.2.2.14
NETMASK=225.225.255.0
GATEWAY=10.2.2.2
DNS=8.8.8.8
DNS1=114.114.114.114
  • 配完之后,要重启网络服务systemctl restart network,保证网络ip正常启动。
  • 查看一下防火墙的状态,systemctl status firewalld。 ​

 3. 配置SSH免密登录,实现节点间通信。

 (1).所有节点生成密钥,并且修改~/.ssh的权限

[root@master ~]# ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

[root@master ~]#   chmod  700  ~/.ssh

(2).先配置主节点到从节点的单向免密

#在主机master执行

[root@master ~]#  ssh-copy-id root@slave1

[root@master ~]#  ssh-copy-id root@slave2

注意:画圈中的地方

(3).收集公钥,并且制作完整的authorized_keys

#在主机master执行

[root@master .ssh]# scp -r root@slave1:~/.ssh/id_rsa.pub    ~/.ssh/slave1.pub
[root@master .ssh]# scp -r root@slave2:~/.ssh/id_rsa.pub    ~/.ssh/slave2.pub
[root@master .ssh]# cat ~/.ssh/id_rsa.pub  ~/.ssh/slave1.pub  ~/.ssh/slave2.pub > ~/.ssh/authorized_keys
[root@master .ssh]# chmod 600 ~/.ssh/authorized_keys 

(4).分发给从节点slave1和slave2

#在主机master执行
[root@master .ssh]# scp -r  ~/.ssh/authorized_keys   root@slave1:~/.ssh/
[root@master .ssh]# scp -r  ~/.ssh/authorized_keys   root@slave2:~/.ssh/

(5).所有节点都修改一下 ~/.ssh , ~/.ssh/id_rsa 和 ~/.ssh/authorized_key的权限(不必需)

[root@master .ssh]# chmod 700 ~/.ssh  
[root@master .ssh]# chmod 600 ~/.ssh/id_rsa  
[root@master .ssh]# chmod 600 ~/.ssh/authorized_keys 

(6).从节点上验证

[root@slave1 ~]# scp  -r  /root/student2.txt   root@master:/root
student2.txt                         100%    0     0.0KB/s   00:00  

  • 没看见Enter Password 这一行提示:说明免密成功。

 二.HAOOP全分布式平台搭建

(1).解压jdk包,到/usr/local/src/下,并修改文件名为/usr/local/src/java

[root@master ~]# tar -zxvf /root/jdk-8u144-linux-x64.tar.gz  -C /usr/local/src
[root@master ~]# mv  /usr/local/src/jdk-8u144-linux-x64   /usr/local/src/java

(2).修改环境变量,即修改/root/.bash_profile

# set java environment
export JAVA_HOME=/usr/local/src/java
export PATH=$PATH:$JAVA_HOME/bin

(3).使环境生效,并验证查看java的版本,主节点master,将java环境包分发给slave1和slave2

[root@master ~] # source  ~/.bash_profile

[root@master ~] #  java  -version

[root@master ~] #  scp -r  /usr/local/src/java    root@slave1:/usr/local/src/

[root@master ~] #  scp -r  /usr/local/src/java    root@slave2:/usr/local/src/

[root@master ~] #  scp -r   ~/.bash_profile     root@slave1:/root/

[root@master ~] #  scp -r   ~/.bash_profile     root@slave2:/root/

(4).节点slave1和slave2 使环境生效,并验证java环境的版本

[root@slave1 ~]# source /root/.bash_profile 
[root@slave1 ~]# java -version

(5).解压hadoop包到/usr/local/src/下,并修改名为/usr/local/src/java

[root@master ~]# tar -zxvf  /root/hadoop-2.7.1.tar.gz   -C /usr/local/src
[root@master ~]# mv   /usr/local/src/hadoop-2.7.1.tar.gz   /usr/local/src/hadoop

(6).配置hadoop的环境变量,还是修改/root/.bash_profile(在原有的内容下增添),还要使环境生效

# set hadoop environment
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

[root@master  ~]#   source  /root/.bash_profile

(7).修改配置文件/usr/local/src/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/local/src/java

(8).修改配置文件/usr/local/src/hadoop/etc/hadoop/core-site.xml

#  搜索找到 configuration

<configuration>
<property>
       <name>fs.defaultFS</name>
       <value>hdfs://master:9000</value>
</property>
<property>
       <name>hadoop.tmp.dir</name>
       <value>/usr/local/src/hadoop/tmp</value>
</property>
</configuration>

(9).修改配置文件/usr/local/src/hadoop/etc/hadoop/hdfs-site.xml

# 找到 configuration

<configuration>
<property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/src/hadoop/dfs/name</value>
</property>
<property>
       <name>dfs.datanode.data.dir</name>
       <value>file:/usr/local/src/hadoop/dfs/data</value>
</property>
<property>
       <name>dfs.replication</name>
       <value>3</value>
</property>

</configuration>

(10).修改配置文件/usr/local/src/hadoop/etc/hadoop/yarn-site.xml

# 找到 configuration

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
</property>
<property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
</property>
<property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
</property>
<property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

(11).拷贝/usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template,并重命名为/usr/local/src/hadoop/etc/hadoop/mapred-site.xml,并且修改配置mapred-site.xml

[root@master hadoop]# cp -r /usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template  /usr/local/src/hadoop/etc/hadoop/mapred-site.xml

# 找到 configuration
<configuration>
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>
<property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
</property>
<property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
</property>
</configuration>

(12).cd /usr/local/src/hadoop/etc/hadoop,修改masters 和 slaves ,文件内容中存在localhost删除掉

#masters

master

#slaves

slave1

slave2

(13).创建目录

[root@master hadoop]# mkdir -p  /usr/local/src/hadoop/tmp

[root@master hadoop]# mkdir -p  /usr/local/src/hadoop/tmp/dfs/name

[root@master hadoop]# mkdir -p  /usr/local/src/hadoop/tmp/dfs/data

(14).主节点master 把hadoop环境分发给从节点slave1和slave2

[root@master ~] #  scp -r  /usr/local/src/hadoop   root@slave1:/usr/local/src/

[root@master ~] #  scp -r  /usr/local/src/hadoop   root@slave2:/usr/local/src/

[root@master ~] #  scp -r   ~/.bash_profile     root@slave1:/root/

[root@master ~] #  scp -r   ~/.bash_profile     root@slave2:/root/

(15).从节点slave1和slave2使环境生效,并验证hadoop 的版本

[root@slave1 ~]# source  /root/.bash_profile 
[root@slave1 ~]# hadoop version

(16).主节点master 上 格式化namenode 

[root@master hadoop]  hdfs  namenode  -format 

注意: 如果需要重新格式化,则删除 文件夹 /usr/local/src/hadoop/tmp/dfs

(17).主节点master开启namenode,secondarynamenode,

[root@master sbin]# start-dfs.sh
[root@master sbin]# start-yarn.sh

  • 然后主节点master 执行jps,namenode,secondarynamenode进程存在,说明hdfs启动成功

  • 然后从节点slave1和slave2上,执行jps,Datanode,NodeManager进程存在,就启动成功

(18).在主节点master上查看hdfs的信息,里面存在两个子节点slave1和slave2

[root@master sbin]# hdfs dfsadmin -report

 (19).测试hadoop是否启动正常

[root@master ~]# vim student.txt
[root@master ~]# cat student.txt
zhangsan  zhangsan
    lisi
    wangwu         wangwu
1
2
3
[root@master ~]# hdfs dfs -mkdir -p  /input

[root@master ~]# hdfs  dfs -ls -d  /input
[root@master ~]# hdfs dfs  -put  ~/student.txt   /input
[root@master ~]# hdfs dfs  -ls     /input/student.txt

[root@master ~]# hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar  wordcount  /input/student.txt   /output

[root@master ~]# hdfs dfs  -ls     /output

[root@master ~]# hdfs dfs  -cat     /output/part-r-00000

1    1
2    1
3    1
lisi    1
wangwu    2
zhangsan    2

三.HIVE数据仓库构建

(1).rpm 安装 mysql 相关的依赖包

[root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm 
[root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm 
[root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
[root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm --force --nodeps

(2).修改配置文件/etc/my.cnf(在主节点master)

[root@master mysql-5.7.18]   vim  /etc/my.cnf

symbolic-links=0   # 在其下面添加内容如下:
default-storage-engine=innodb
innodb_file_per_table=1
character-set-server=utf8       # 可以更改为 uf8mb4
collation-server=utf8_general_ci  # 可以更改为 utf8mb4_unicode_ci
init-connect='set NAMES utf8'    #  可以更改为  set NAMES  utf8mb4

(3).启动mysqld服务,查看mysqld的状态,

[root@master mysql-5.7.18]# systemctl start mysqld
[root@master mysql-5.7.18]# systemctl status mysqld

(4).查询mysql数据库默认的密码,并使用mysql_secure_installation修改密码

[root@master mysql-5.7.18]#  cat /var/log/mysqld.log | grep password
[root@master mysql-5.7.18]#  mysql_secure_installation

(5).解压hive包到/usr/local/src/下,修改文件名为/usr/local/src/hive

[root@master ~]# tar -zxvf  apache-hive-2.0.0-bin.tar.gz   -C /usr/local/src
[root@master ~]#  mv  /usr/local/src/apache-hive-2.0.0-bin   /usr/local/src/hive

(6).配置hive环境,修改/root/.bash_profile,然后使环境生效(source /root/.bash_profile)

#set hive environment
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin

(7).拷贝/usr/local/src/hive/conf/hive-default.xml.template的副本,然后重命名为/usr/local/src/hive/conf/hive-site.xml

[root@master ~] # cp -rf /usr/local/src/hive/conf/hive-default.xml.template  /usr/local/src/hive/conf/hive-site.xml

(8).修改配置文件/usr/local/src/hive/conf/hive-site.xml

  # 设置 MySQL 数据库连接。
  # 找到  <name>javax.jdo.option.ConnectionURL</name> 
  # 修改  <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=UTC</value>
  # 配置 MySQL 数据库 root 的密码
  # 找到 <name>javax.jdo.option.ConnectionPassword</name>
  # 修改 <value>Password123$</value>
  # 验证元数据存储版本一致性。 
  #找到  <name>hive.metastore.schema.verification</name>
  #修改  <value>false</value>
  # 配置数据库驱动。
  # 找到  <name>javax.jdo.option.ConnectionDriverName</name>
  # 修改  <value>com.mysql.jdbc.Driver</value>
  # 配置数据库用户名 
  # 找到  <name>javax.jdo.option.ConnectionUserName</name>
  # 修改  <value>root</value>
  # 日志查询位置
  # 找到   <name>hive.querylog.location</name>
  # 修改   <value>/usr/local/src/hive/tmp</value>
  # 用于指定 本地临时目录
  # 找到   <name>hive.exec.local.scratchdir</name>
  # 修改   <value>/usr/local/src/hive/tmp</value>
  # 配置 下载资源目录 
  #找到  <name>hive.downloaded.resources.dir</name>
  #修改  <value>/usr/local/src/hive/tmp/resources</value>
  # 指定 操作日志的存储位置
  #找到 <name>hive.server2.logging.operation.log.location</name>
  #修改 <value>/usr/local/src/hive/tmp/operation_logs</value>

(9).创建临时目录

[root@master mysql-5.7.18]#  mkdir -p   /usr/local/src/hive/tmp

(10).导入包文件,并删除多于的包文件

[root@master ~]# cp  -rf /root/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
[root@master ~]# rm -f /usr/local/src/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar

(11).开启hadoop集群

[root@master sbin]# start-dfs.sh
[root@master sbin]# start-yarn.sh

(12).初始化数据库

[root@master sbin] # schematool -dbType mysql -initSchema -verbose

A.如果遇到的错误是:

  [root@master sbin] # schematool -dbType mysql -initSchema -verbose

  xxxxxSLF4J: Class path contains multiple SLF4J bindings xxxxxxxxxx

  则需删除以下两个包文件,然后再继续执行(schematool -dbType mysql -initSchema -verbose)

 [root@master sbin] # rm /usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar
 [root@master sbin] # rm /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar
 [root@master sbin] # schematool -dbType mysql -initSchema -verbose

B.如果遇到的错误是:

[root@master sbin] # schematool -dbType mysql -initSchema -verbose

Metastore connection URL:        jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.

解决办法:给用户赋予权限,然后再继续执行( schematool -dbType mysql -initSchema -verbose)

[root@master ~] # mysql -uroot -p '密码'
 mysql > create user 'root'@'master' identified with mysql_native_password by '密码';
 mysql > create user 'root'@'%' identified with mysql_native_password by '密码';
 mysql > grant all privileges on *.* to  'root'@'localhost' with grant option;
 mysql > grant all privileges on *.* to 'root'@'master' with grant option;
 mysql > grant all privileges on *.* to 'root'@'%' with grant option;
 mysql > flush privileges;
 mysql > exit;
 [root@master ~]# schematool -dbType mysql -initSchema -verbose

C.如果遇到的错误是:

[root@master sbin] # schematool -dbType mysql -initSchema -verbose

Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !

 解决方法:

[root@master ~] # mysql -uroot -p '密码'
mysql > use databases;
mysql > drop database if not exist hive;
mysql > exit;
[root@master ~] # schematool -dbType mysql -initSchema -verbose

(13).hive 数据仓库的验证

# 在主节点master下执行

[root@master ~] # hdfs dfsadmin  -safemode  leave

[root@master ~] #  cd  /usr/local/src/hive/bin

[root@ master bin] ./ hive

hive > create  database  good_hive;

hive > use good_hive;

hive > create table sh_goods(id int,category_id int, name string, keyword string) row format delimited fields terminated by ',';

hive > load data local inpath "/root/sh_goods.csv"  into table sh_goods;

hive >  desc sh_goods;

hive >  select * from sh_goods;

hive > exit;

四.zookeeper平台配置

(1).解压zookeeper到指定的位置/usr/local/src/,重命名为/usr/local/src/zookeeper,

[root@master ~]# tar -zxvf /root/zookeeper-3.4.8.tar.gz  -C /usr/local/src
[root@master ~]#  mv /usr/local/src/zookeeper-3.4.8  /usr/local/src/zookeeper

(2).创建目录,后续备用

[root@master ~] # mkdir -p /usr/local/src/zookeeper/data  /usr/local/src/zookeeper/logs

(3).写入节点编号,主节点master 为1

[root@master ~] # echo  1 >  /usr/local/src/zookeeper/data/myid

(4).拷贝副本/usr/local/src/zookeeper/conf/zoo_sample.cfg,重命名为/usr/local/src/zookeeper/conf/zoo.cfg

[root@master conf]# cp -rf /usr/local/src/zookeeper/conf/zoo_sample.cfg   /usr/local/src/zookeeper/conf/zoo.cfg 

(5).修改配置文件/usr/local/src/zookeeper/conf/zoo.cfg

# 修改
dataDir=/usr/local/src/zookeeper/data
dataLogDir=/usr/local/src/zookeeper/logs

clientPort=2181  # 之后添加三个节点的访问端口号
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

(6).配置zookeeper 环境,然后使环境生效(source /root/.bash_profile)

# set zookeeper environment
export ZOOKEEPER_HOME=/usr/local/src/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin

(7).主节点master分发给节点slave1和slave2,

[root@master ~]# scp -r /usr/local/src/zookeeper  root@slave1:/usr/local/src
[root@master ~]# scp -r /usr/local/src/zookeeper  root@slave2:/usr/local/src
[root@master ~]# scp -r /root/.bash_profile   root@slave1:/root
[root@master ~]# scp -r /root/.bash_profile   root@slave2:/root
[root@master ~]# source /root/.bash_profile

(8).节点slave1和slave2分别写入编号, slave1为2,slave2为3,每一个从节点执行(source /root/.bash_profile)

#slave1

[root@slave1 ~] # echo  2  >  /usr/local/src/zookeeper/data/myid

[root@slave1 ~] # source /root/.bash_profile

#slave2

[root@slave2  ~] # echo  3  >  /usr/local/src/zookeeper/data/myid

[root@slave2 ~] # source /root/.bash_profile

(9).三个节点开启zookeeper进程,zookeeper的状态是leader和follower,三个节点状态包括一个leader和两个follower

A.master

[root@master conf]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@master conf]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower

B.slave1

[root@slave1 zookeeper]# cd bin/
[root@slave1 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave1 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: leader

C.slave2

[root@slave2 ~]# cd /usr/local/src/zookeeper/bin/
[root@slave2 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave2 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower