1. 环境准备及版本介绍
1.Linux版本介绍
CentOS release 6.8 (Final)
Linux version 2.6.32-642.el6.x86_64 (mockbuild@worker1.bsys.centos.org)(gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 10 17:27:01 UTC 2016
镜像版本:CentOS-6.8-x86_64-minimal.iso
2.JDK版本
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
环境变量配置:
vim /etc/profile 添加以下文件内容 然后source /etc/profile
export JAVA_HOME=/home/software/jdk/jdk1.8.0_131
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/rt.jar
3.Hadoop版本
hadoop-2.7.7
2.节点安排
备注/etc/hosts:
192.168.1.211 z1
192.168.1.212 z2
192.168.1.213 z3
192.168.1.214 z4
3.主机之间免密操作
1.生成公钥
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2.将公钥追加至~/.ssh/authorized_keys
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
如与其他主机进行免密则需要将公钥传输至目标主机并追加至该主机的~/.ssh/authorized_keys,首先也得在目标主机执行 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
使用scp命令进行传输
4.安装zookeeper
1.zoo.cfg文件配置
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=z1:2888:3888
server.2=z2:2888:3888
server.3=z3:2888:3888
2.dataDir配置的路径下创建myid文件
dataDir=/opt/zookeeper
编辑myid在对应的机器上输入自己内容,与机器对应
server.1=z1:2888:3888
server.2=z2:2888:3888
server.3=z3:2888:3888
例如:在z1机器上的/opt/zookeeper创建一个myid文件 内容为:1
3.启动zookeeper
进入zookeeper的安装的bin目录,在所有安装的机器执行./zkServer.sh start
执行./zkServer.sh status查看是否启动成功
5.开始安装Hadoop
1.hadoop-env.sh文件配置
export JAVA_HOME=/home/software/jdk/jdk1.8.0_131
2.hdfs-site.xml文件配置
<configuration>
<property>
<name>dfs.nameservices</name>
<value>bigdata</value>
</property>
<property>
<name>dfs.ha.namenodes.bigdata</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.bigdata.nn1</name>
<value>z1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.bigdata.nn2</name>
<value>z2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.bigdata.nn1</name>
<value>z1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.bigdata.nn2</name>
<value>z2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://z2:8485;z3:8485;z4:8485/bigdata</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.bigdata</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/journaldata</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
3.core-site.xml配置文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>z1:2181,z2:2181,z3:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/workspace/hadoop-2.7.7</value>
</property>
</configuration>
4.slaves配置文件
z2
z3
z4
6.启动集群
1.第一步
在安装journalnode的主机上(z2,z3,z4)上启动journalnode
执行:./hadoop-daemon.sh start journalnode
2.第二步
选择一台namenode主机进行操作
执行 ./hdfs namenode –format来格式化任意一个namenode
执行 ./hadoop-daemon.sh start namenode来启动这个格式化的namenode
3.第三步
另一台主机进行如下操作
执行:./hdfs namenode –bootstrapStandby
4.第四步
执行 ./stop-dfs.sh停止所有服务
5.第五步
执行 ./hdfs zkfc –formatZK
6.第六步
执行 ./start-dfs.sh 启动集群
7.第七步 访问集群
浏览器 http://192.168.1.211:50070 访问集群
7.安装MapReduce
1.mapred-site.xml配置
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 日志配置 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>z1:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>z1:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/mr-history/tmp</value>
<description>Directory where history files are written by MapReduce jobs</description>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/mr-history/done</value>
<description>Directory where history files are managed by the MR JobHistory Server</description>
</property>
</configuration>
- yarn-site.xml配置
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>bigdata</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>z1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>z2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>z1:2181,z2:2181,z3:2181</value>
</property>
<!-- 解决spark on yarn 报错问题-,可能是内存不符合配置 关闭检查->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
<!-- 日志配置 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://z1:19888/jobhistory/logs</value>
</property>
</configuration>
3.启动MapReduce
进入sbin目录 执行 ./start-yarn.sh 启动
4.启动JobHistoryServer
进入sbin目录 执行 ./ mr-jobhistory-daemon.sh start historyserver 启动
浏览器输入 http://z1:19888 访问