世界上并没有完美的程序,但是我们并不因此而沮丧,因为写程序就是一个不断追求完美的过程。
首先注意一点,搭建过程中可能会出现异常,可以关注公众号查看异常的解决方式
docker-ssh
Dockerfile
FROM centos:base
MAINTAINER hbw
RUN yum -y install passwd openssl openssh-server \
&& ssh-keygen -q -t rsa -b 2048 -f /etc/ssh/ssh_host_rsa_key -N '' \
&& ssh-keygen -q -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key -N '' \
&& ssh-keygen -t dsa -f /etc/ssh/ssh_host_ed25519_key -N '' \
&& sed -i "s/#UsePrivilegeSeparation.*/UsePrivilegeSeparation no/g" /etc/ssh/sshd_config \
&& sed -i "s/UsePAM.*/UsePAM no/g" /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
构建
docker build -t centos-ssh-1:base .
docker-ssh-jdk: jdk-8u111-linux-x64.tar.gz
Dockerfile
FROM centos-ssh-1:base
ADD jdk-8u111-linux-x64.tar.gz /home
RUN mv /home/jdk1.8.0_111 /home/jdk8
ENV JAVA_HOME /home/jdk8
ENV PATH $JAVA_HOME/bin:$PATH
构建
docker build -t centos-ssh-jdk:base .
docker-ssh-jdk-hadoop
Dockerfile
FROM centos-ssh-jdk:base
ADD hadoop-3.2.1.tar.gz /home/
RUN mv /home/hadoop-3.2.1 /home/hadoop
ENV HADOOP_HOME /home/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH
构建
docker build -t docker-ssh-jdk-hadoop:base .
接下来创建docker-ssh-jdk-hadoop容器并进入
hadoop的配置:
vi hadoop-env.sh
export JAVA_HOME=/home/jdk8
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vi hdts-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
启动:
启动 ssh: 需要设置免密登录
/usr/sbin/sshd
启动 hadoop:
首先格式化hdfs
bin:
./hdfs namenode -format
sbin
./start-all.sh
启动时会报一些异常,到ex-docker-hadoop查看解决
运行docker-local容器搭建集群
docker run --name hadoop-1 --hostname hnode-1 -d -P -p 9870:9870 -p 8088:8088 hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"
docker run --name hadoop-2 --hostname hnode-2 -d -P hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"
docker run --name hadoop-3 --hostname hnode-3 -d -P hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"
然后查看每个容器的ip,并为每一个容器设置/etc/hosts
172.17.0.6 hnode-1
172.17.0.11 hnode-2
172.17.0.12 hnode-3
然后在hadoop-1上修改以下配置
hadoop-env.sh
export JAVA_HOME=/home/jdk8
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hnode-1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>hnode-1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
然后配置在 hadoop/etc/hadoop/slaves中配置从服务器
hnode-2
hnode-3
然后将配置好的hadoop复制到其他两个容器,注意启动sshd
scp -rq /home/hadoop hnode-1:/home
scp -rq /home/hadoop hnode-2:/home
然后依次格式化并启动每个hadoop服务
hdfs namenode -format
./start-all.sh
查看启动的进程
主节点:jps
3344 NodeManager
2726 SecondaryNameNode
3014 ResourceManager
2503 DataNode
4967 Jps
2329 NameNode
从节点:jps
2936 NodeManager
2509 SecondaryNameNode
2286 DataNode
4255 Jps
查看hadoop集群基本情况:
hdfs fsck /
可以看到数据节点数为3
运行测试:/home/hadoop/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-3.2.1.jar pi 3 5
wordcount测试:/home/hadoop/share/hadoop/mapreduce
vi test.txt
hello hello good good good
hdfs dfs -put test.txt /input/test.txt
hdfs dfs -ls /input
hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount /input/test.txt /output
hdfs dfs -ls /output
hdfs dfs -text /output/part-r-00000
输出结果:
good 5
hello 3
name 2
然后生成镜像:
docker commit hadoop-1 hadoop:node
然后保存成文件
docker save hadoop:node -o hadoop-node.tar
操作过程中的异常处理,请关注公众号