容器大数据-docker从底层镜像开始搭建hadoop集群

154 阅读2分钟

世界上并没有完美的程序,但是我们并不因此而沮丧,因为写程序就是一个不断追求完美的过程。

首先注意一点,搭建过程中可能会出现异常,可以关注公众号查看异常的解决方式

docker-ssh
Dockerfile

FROM centos:base
MAINTAINER hbw
RUN yum -y install passwd openssl openssh-server \
    && ssh-keygen -q -t rsa -b 2048 -f /etc/ssh/ssh_host_rsa_key -N '' \
    && ssh-keygen -q -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key -N '' \
    && ssh-keygen -t dsa -f /etc/ssh/ssh_host_ed25519_key -N '' \
    && sed -i "s/#UsePrivilegeSeparation.*/UsePrivilegeSeparation no/g" /etc/ssh/sshd_config \
    && sed -i "s/UsePAM.*/UsePAM no/g" /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]    

构建

docker build -t centos-ssh-1:base .

docker-ssh-jdk: jdk-8u111-linux-x64.tar.gz
Dockerfile

FROM centos-ssh-1:base
ADD jdk-8u111-linux-x64.tar.gz /home
RUN mv /home/jdk1.8.0_111 /home/jdk8
ENV JAVA_HOME /home/jdk8
ENV PATH $JAVA_HOME/bin:$PATH

构建

docker build -t centos-ssh-jdk:base .

docker-ssh-jdk-hadoop
Dockerfile

FROM centos-ssh-jdk:base
ADD hadoop-3.2.1.tar.gz /home/
RUN mv /home/hadoop-3.2.1 /home/hadoop
ENV HADOOP_HOME /home/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH

构建

docker build -t docker-ssh-jdk-hadoop:base .

接下来创建docker-ssh-jdk-hadoop容器并进入
hadoop的配置:

vi hadoop-env.sh
    export JAVA_HOME=/home/jdk8

vi mapred-site.xml
    <configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
          <name>yarn.app.mapreduce.am.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.map.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.reduce.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
    </configuration>
    
vi core-site.xml
    <configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
vi hdts-site.xml
    <configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
    </configuration>

vi yarn-site.xml
    <configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
    </configuration>

启动:

启动 ssh: 需要设置免密登录
    /usr/sbin/sshd
启动 hadoop: 
首先格式化hdfs
bin:
    ./hdfs namenode -format
sbin
    ./start-all.sh   

启动时会报一些异常,到ex-docker-hadoop查看解决

运行docker-local容器搭建集群

docker run --name hadoop-1 --hostname hnode-1 -d -P -p 9870:9870 -p 8088:8088 hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"
docker run --name hadoop-2 --hostname hnode-2 -d -P hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"
docker run --name hadoop-3 --hostname hnode-3 -d -P hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"

然后查看每个容器的ip,并为每一个容器设置/etc/hosts

172.17.0.6 hnode-1
172.17.0.11 hnode-2
172.17.0.12 hnode-3

然后在hadoop-1上修改以下配置

hadoop-env.sh
export JAVA_HOME=/home/jdk8


core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hnode-1:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/tmp</value>
        </property>
         <property>
                 <name>fs.trash.interval</name>
                 <value>1440</value>
        </property>
</configuration>


hdfs-site.xml
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>


yarn-site.xml
<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <property>
                <description>The hostname of the RM.</description>
                <name>yarn.resourcemanager.hostname</name>
                <value>hnode-1</value>
        </property>
</configuration>


mapred-site.xml
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
          <name>yarn.app.mapreduce.am.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.map.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.reduce.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
</configuration>

然后配置在 hadoop/etc/hadoop/slaves中配置从服务器

hnode-2
hnode-3

然后将配置好的hadoop复制到其他两个容器,注意启动sshd

scp  -rq /home/hadoop   hnode-1:/home
scp  -rq /home/hadoop   hnode-2:/home

然后依次格式化并启动每个hadoop服务

hdfs namenode -format
./start-all.sh

查看启动的进程
主节点:jps

3344 NodeManager
2726 SecondaryNameNode
3014 ResourceManager
2503 DataNode
4967 Jps
2329 NameNode

从节点:jps

2936 NodeManager
2509 SecondaryNameNode
2286 DataNode
4255 Jps

查看hadoop集群基本情况:

 hdfs fsck /

可以看到数据节点数为3

运行测试:/home/hadoop/share/hadoop/mapreduce

hadoop jar hadoop-mapreduce-examples-3.2.1.jar pi 3 5

wordcount测试:/home/hadoop/share/hadoop/mapreduce

vi test.txt
    hello hello good good good
hdfs dfs -put test.txt /input/test.txt
hdfs dfs -ls /input
hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount /input/test.txt /output
 hdfs dfs -ls /output
 hdfs dfs -text /output/part-r-00000
 输出结果:
 good	5
hello	3
name	2

然后生成镜像:

docker commit hadoop-1 hadoop:node

然后保存成文件

docker save hadoop:node -o hadoop-node.tar

操作过程中的异常处理,请关注公众号
在这里插入图片描述