一.操作系统redhat7.4
- 三台主机
| 主机名 | ip | 说明 |
|---|---|---|
| master | 10.2.2.14 | 主节点 |
| slave1 | 10.2.2.15 | 从节点 |
| slave2 | 10.2.2.16 | 从节点 |
2.三台主机都修改配置文件/etc/sysconfig/network-scripts/ifcfg-ens33文件:
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=10.2.2.14
NETMASK=225.225.255.0
GATEWAY=10.2.2.2
DNS=8.8.8.8
DNS1=114.114.114.114
- 配完之后,要重启网络服务systemctl restart network,保证网络ip正常启动。
- 查看一下防火墙的状态,systemctl status firewalld。
3. 配置SSH免密登录,实现节点间通信。
(1).所有节点生成密钥,并且修改~/.ssh的权限
[root@master ~]# ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
[root@master ~]# chmod 700 ~/.ssh
(2).先配置主节点到从节点的单向免密
#在主机master执行
[root@master ~]# ssh-copy-id root@slave1
[root@master ~]# ssh-copy-id root@slave2
注意:画圈中的地方
(3).收集公钥,并且制作完整的authorized_keys
#在主机master执行
[root@master .ssh]# scp -r root@slave1:~/.ssh/id_rsa.pub ~/.ssh/slave1.pub
[root@master .ssh]# scp -r root@slave2:~/.ssh/id_rsa.pub ~/.ssh/slave2.pub
[root@master .ssh]# cat ~/.ssh/id_rsa.pub ~/.ssh/slave1.pub ~/.ssh/slave2.pub > ~/.ssh/authorized_keys
[root@master .ssh]# chmod 600 ~/.ssh/authorized_keys
(4).分发给从节点slave1和slave2
#在主机master执行
[root@master .ssh]# scp -r ~/.ssh/authorized_keys root@slave1:~/.ssh/
[root@master .ssh]# scp -r ~/.ssh/authorized_keys root@slave2:~/.ssh/
(5).所有节点都修改一下 ~/.ssh , ~/.ssh/id_rsa 和 ~/.ssh/authorized_key的权限(不必需)
[root@master .ssh]# chmod 700 ~/.ssh
[root@master .ssh]# chmod 600 ~/.ssh/id_rsa
[root@master .ssh]# chmod 600 ~/.ssh/authorized_keys
(6).从节点上验证
[root@slave1 ~]# scp -r /root/student2.txt root@master:/root
student2.txt 100% 0 0.0KB/s 00:00
- 没看见Enter Password 这一行提示:说明免密成功。
二.HAOOP全分布式平台搭建
(1).解压jdk包,到/usr/local/src/下,并修改文件名为/usr/local/src/java
[root@master ~]# tar -zxvf /root/jdk-8u144-linux-x64.tar.gz -C /usr/local/src
[root@master ~]# mv /usr/local/src/jdk-8u144-linux-x64 /usr/local/src/java
(2).修改环境变量,即修改/root/.bash_profile
# set java environment
export JAVA_HOME=/usr/local/src/java
export PATH=$PATH:$JAVA_HOME/bin
(3).使环境生效,并验证查看java的版本,主节点master,将java环境包分发给slave1和slave2
[root@master ~] # source ~/.bash_profile
[root@master ~] # java -version
[root@master ~] # scp -r /usr/local/src/java root@slave1:/usr/local/src/
[root@master ~] # scp -r /usr/local/src/java root@slave2:/usr/local/src/
[root@master ~] # scp -r ~/.bash_profile root@slave1:/root/
[root@master ~] # scp -r ~/.bash_profile root@slave2:/root/
(4).节点slave1和slave2 使环境生效,并验证java环境的版本
[root@slave1 ~]# source /root/.bash_profile
[root@slave1 ~]# java -version
(5).解压hadoop包到/usr/local/src/下,并修改名为/usr/local/src/java
[root@master ~]# tar -zxvf /root/hadoop-2.7.1.tar.gz -C /usr/local/src
[root@master ~]# mv /usr/local/src/hadoop-2.7.1.tar.gz /usr/local/src/hadoop
(6).配置hadoop的环境变量,还是修改/root/.bash_profile(在原有的内容下增添),还要使环境生效
# set hadoop environment
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
[root@master ~]# source /root/.bash_profile
(7).修改配置文件/usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/src/java
(8).修改配置文件/usr/local/src/hadoop/etc/hadoop/core-site.xml
# 搜索找到 configuration
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/src/hadoop/tmp</value>
</property>
</configuration>
(9).修改配置文件/usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
# 找到 configuration
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/src/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/src/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
(10).修改配置文件/usr/local/src/hadoop/etc/hadoop/yarn-site.xml
# 找到 configuration
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
(11).拷贝/usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template,并重命名为/usr/local/src/hadoop/etc/hadoop/mapred-site.xml,并且修改配置mapred-site.xml
[root@master hadoop]# cp -r /usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
# 找到 configuration
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
(12).cd /usr/local/src/hadoop/etc/hadoop,修改masters 和 slaves ,文件内容中存在localhost删除掉
#masters
master
#slaves
slave1
slave2
(13).创建目录
[root@master hadoop]# mkdir -p /usr/local/src/hadoop/tmp
[root@master hadoop]# mkdir -p /usr/local/src/hadoop/tmp/dfs/name
[root@master hadoop]# mkdir -p /usr/local/src/hadoop/tmp/dfs/data
(14).主节点master 把hadoop环境分发给从节点slave1和slave2
[root@master ~] # scp -r /usr/local/src/hadoop root@slave1:/usr/local/src/
[root@master ~] # scp -r /usr/local/src/hadoop root@slave2:/usr/local/src/
[root@master ~] # scp -r ~/.bash_profile root@slave1:/root/
[root@master ~] # scp -r ~/.bash_profile root@slave2:/root/
(15).从节点slave1和slave2使环境生效,并验证hadoop 的版本
[root@slave1 ~]# source /root/.bash_profile
[root@slave1 ~]# hadoop version
(16).主节点master 上 格式化namenode
[root@master hadoop] hdfs namenode -format
注意: 如果需要重新格式化,则删除 文件夹 /usr/local/src/hadoop/tmp/dfs
(17).主节点master开启namenode,secondarynamenode,
[root@master sbin]# start-dfs.sh
[root@master sbin]# start-yarn.sh
- 然后主节点master 执行jps,namenode,secondarynamenode进程存在,说明hdfs启动成功
- 然后从节点slave1和slave2上,执行jps,Datanode,NodeManager进程存在,就启动成功
(18).在主节点master上查看hdfs的信息,里面存在两个子节点slave1和slave2
[root@master sbin]# hdfs dfsadmin -report
(19).测试hadoop是否启动正常
[root@master ~]# vim student.txt
[root@master ~]# cat student.txt
zhangsan zhangsan
lisi
wangwu wangwu
1
2
3
[root@master ~]# hdfs dfs -mkdir -p /input
[root@master ~]# hdfs dfs -ls -d /input
[root@master ~]# hdfs dfs -put ~/student.txt /input
[root@master ~]# hdfs dfs -ls /input/student.txt
[root@master ~]# hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/student.txt /output
[root@master ~]# hdfs dfs -ls /output
[root@master ~]# hdfs dfs -cat /output/part-r-00000
1 1
2 1
3 1
lisi 1
wangwu 2
zhangsan 2
三.HIVE数据仓库构建
(1).rpm 安装 mysql 相关的依赖包
[root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
[root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
[root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
[root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm --force --nodeps
(2).修改配置文件/etc/my.cnf(在主节点master)
[root@master mysql-5.7.18] vim /etc/my.cnf
symbolic-links=0 # 在其下面添加内容如下:
default-storage-engine=innodb
innodb_file_per_table=1
character-set-server=utf8 # 可以更改为 uf8mb4
collation-server=utf8_general_ci # 可以更改为 utf8mb4_unicode_ci
init-connect='set NAMES utf8' # 可以更改为 set NAMES utf8mb4
(3).启动mysqld服务,查看mysqld的状态,
[root@master mysql-5.7.18]# systemctl start mysqld
[root@master mysql-5.7.18]# systemctl status mysqld
(4).查询mysql数据库默认的密码,并使用mysql_secure_installation修改密码
[root@master mysql-5.7.18]# cat /var/log/mysqld.log | grep password
[root@master mysql-5.7.18]# mysql_secure_installation
(5).解压hive包到/usr/local/src/下,修改文件名为/usr/local/src/hive
[root@master ~]# tar -zxvf apache-hive-2.0.0-bin.tar.gz -C /usr/local/src
[root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin /usr/local/src/hive
(6).配置hive环境,修改/root/.bash_profile,然后使环境生效(source /root/.bash_profile)
#set hive environment
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
(7).拷贝/usr/local/src/hive/conf/hive-default.xml.template的副本,然后重命名为/usr/local/src/hive/conf/hive-site.xml
[root@master ~] # cp -rf /usr/local/src/hive/conf/hive-default.xml.template /usr/local/src/hive/conf/hive-site.xml
(8).修改配置文件/usr/local/src/hive/conf/hive-site.xml
# 设置 MySQL 数据库连接。
# 找到 <name>javax.jdo.option.ConnectionURL</name>
# 修改 <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=UTC</value>
# 配置 MySQL 数据库 root 的密码
# 找到 <name>javax.jdo.option.ConnectionPassword</name>
# 修改 <value>Password123$</value>
# 验证元数据存储版本一致性。
#找到 <name>hive.metastore.schema.verification</name>
#修改 <value>false</value>
# 配置数据库驱动。
# 找到 <name>javax.jdo.option.ConnectionDriverName</name>
# 修改 <value>com.mysql.jdbc.Driver</value>
# 配置数据库用户名
# 找到 <name>javax.jdo.option.ConnectionUserName</name>
# 修改 <value>root</value>
# 日志查询位置
# 找到 <name>hive.querylog.location</name>
# 修改 <value>/usr/local/src/hive/tmp</value>
# 用于指定 本地临时目录
# 找到 <name>hive.exec.local.scratchdir</name>
# 修改 <value>/usr/local/src/hive/tmp</value>
# 配置 下载资源目录
#找到 <name>hive.downloaded.resources.dir</name>
#修改 <value>/usr/local/src/hive/tmp/resources</value>
# 指定 操作日志的存储位置
#找到 <name>hive.server2.logging.operation.log.location</name>
#修改 <value>/usr/local/src/hive/tmp/operation_logs</value>
(9).创建临时目录
[root@master mysql-5.7.18]# mkdir -p /usr/local/src/hive/tmp
(10).导入包文件,并删除多于的包文件
[root@master ~]# cp -rf /root/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
[root@master ~]# rm -f /usr/local/src/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar
(11).开启hadoop集群
[root@master sbin]# start-dfs.sh
[root@master sbin]# start-yarn.sh
(12).初始化数据库
[root@master sbin] # schematool -dbType mysql -initSchema -verbose
A.如果遇到的错误是:
[root@master sbin] # schematool -dbType mysql -initSchema -verbose
xxxxxSLF4J: Class path contains multiple SLF4J bindings xxxxxxxxxx
则需删除以下两个包文件,然后再继续执行(schematool -dbType mysql -initSchema -verbose)
[root@master sbin] # rm /usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar
[root@master sbin] # rm /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar
[root@master sbin] # schematool -dbType mysql -initSchema -verbose
B.如果遇到的错误是:
[root@master sbin] # schematool -dbType mysql -initSchema -verbose
Metastore connection URL: jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
解决办法:给用户赋予权限,然后再继续执行( schematool -dbType mysql -initSchema -verbose)
[root@master ~] # mysql -uroot -p '密码'
mysql > create user 'root'@'master' identified with mysql_native_password by '密码';
mysql > create user 'root'@'%' identified with mysql_native_password by '密码';
mysql > grant all privileges on *.* to 'root'@'localhost' with grant option;
mysql > grant all privileges on *.* to 'root'@'master' with grant option;
mysql > grant all privileges on *.* to 'root'@'%' with grant option;
mysql > flush privileges;
mysql > exit;
[root@master ~]# schematool -dbType mysql -initSchema -verbose
C.如果遇到的错误是:
[root@master sbin] # schematool -dbType mysql -initSchema -verbose
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !
解决方法:
[root@master ~] # mysql -uroot -p '密码'
mysql > use databases;
mysql > drop database if not exist hive;
mysql > exit;
[root@master ~] # schematool -dbType mysql -initSchema -verbose
(13).hive 数据仓库的验证
# 在主节点master下执行
[root@master ~] # hdfs dfsadmin -safemode leave
[root@master ~] # cd /usr/local/src/hive/bin
[root@ master bin] ./ hive
hive > create database good_hive;
hive > use good_hive;
hive > create table sh_goods(id int,category_id int, name string, keyword string) row format delimited fields terminated by ',';
hive > load data local inpath "/root/sh_goods.csv" into table sh_goods;
hive > desc sh_goods;
hive > select * from sh_goods;
hive > exit;
四.zookeeper平台配置
(1).解压zookeeper到指定的位置/usr/local/src/,重命名为/usr/local/src/zookeeper,
[root@master ~]# tar -zxvf /root/zookeeper-3.4.8.tar.gz -C /usr/local/src
[root@master ~]# mv /usr/local/src/zookeeper-3.4.8 /usr/local/src/zookeeper
(2).创建目录,后续备用
[root@master ~] # mkdir -p /usr/local/src/zookeeper/data /usr/local/src/zookeeper/logs
(3).写入节点编号,主节点master 为1
[root@master ~] # echo 1 > /usr/local/src/zookeeper/data/myid
(4).拷贝副本/usr/local/src/zookeeper/conf/zoo_sample.cfg,重命名为/usr/local/src/zookeeper/conf/zoo.cfg
[root@master conf]# cp -rf /usr/local/src/zookeeper/conf/zoo_sample.cfg /usr/local/src/zookeeper/conf/zoo.cfg
(5).修改配置文件/usr/local/src/zookeeper/conf/zoo.cfg
# 修改
dataDir=/usr/local/src/zookeeper/data
dataLogDir=/usr/local/src/zookeeper/logs
clientPort=2181 # 之后添加三个节点的访问端口号
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
(6).配置zookeeper 环境,然后使环境生效(source /root/.bash_profile)
# set zookeeper environment
export ZOOKEEPER_HOME=/usr/local/src/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
(7).主节点master分发给节点slave1和slave2,
[root@master ~]# scp -r /usr/local/src/zookeeper root@slave1:/usr/local/src
[root@master ~]# scp -r /usr/local/src/zookeeper root@slave2:/usr/local/src
[root@master ~]# scp -r /root/.bash_profile root@slave1:/root
[root@master ~]# scp -r /root/.bash_profile root@slave2:/root
[root@master ~]# source /root/.bash_profile
(8).节点slave1和slave2分别写入编号, slave1为2,slave2为3,每一个从节点执行(source /root/.bash_profile)
#slave1
[root@slave1 ~] # echo 2 > /usr/local/src/zookeeper/data/myid
[root@slave1 ~] # source /root/.bash_profile
#slave2
[root@slave2 ~] # echo 3 > /usr/local/src/zookeeper/data/myid
[root@slave2 ~] # source /root/.bash_profile
(9).三个节点开启zookeeper进程,zookeeper的状态是leader和follower,三个节点状态包括一个leader和两个follower
A.master
[root@master conf]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@master conf]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower
B.slave1
[root@slave1 zookeeper]# cd bin/
[root@slave1 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave1 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: leader
C.slave2
[root@slave2 ~]# cd /usr/local/src/zookeeper/bin/
[root@slave2 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave2 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower