滴滴云部署Hadoop3.1.1

883 阅读20分钟
原文链接: blog.didiyun.com

1.本例集群架构如下:

此处我们使用的是滴滴云主机内网 IP,如果需要外部访问 Hadoop,需要绑定公网 IP 即 EIP。有关滴滴云 EIP 的使用请参考以下链接。
https://help.didiyun.com/hc/kb/section/1035272/

  • master 节点保存着分布式文件的系统信息,比如 inode 表和资源调度器及其记录。同时 master 还运行着两个守护进程:
    NameNode:管理分布式文件系统,存放数据块在集群中所在的位置。
    ResourceManger:负责调度数据节点(本例中为 node1 和 node2)上的资源,每个数据节点上都有一个 NodeManger 来执行实际工作。
  • node1 和 node2 节点负责存储实际数据并提供计算资源,运行两个守护进程:
    DataNode:负责管理实际数据的物理储存。
    NodeManager:管理本节点上计算任务的执行。

2.系统配置

本例中使用的滴滴云虚拟机配置如下:
2核CPU 4G内存 40G HDD存储 3 Mbps带宽 CentOS 7.4

  • 滴滴云主机出于安全考虑,默认不能通过 root 用户直接登录,需要先用 dc2-user 登录,然后用 sudo su 切换至 root。本例中默认全部以 dc2-user 用户运行命令,Hadoop默认用户同样为 dc2-user。
  • 将三台节点的 IP 和主机名分别写入三台节点的 /etc/hosts 文件,并把前三行注释掉

sudo vi /etc/hosts #127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 #::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 #127.0.0.1 10-254-149-24 10.254.149.24 master 10.254.88.218 node1 10.254.84.165 node2
1
2
3
4
5
6
7
8
sudo vi /etc/hosts
#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#127.0.0.1 10-254-149-24
10.254.149.24   master
10.254.88.218   node1
10.254.84.165   node2
 

  • master 节点需要与 node1 和 node2 进行 ssh 密钥对连接,在 master 节点上为 dc2-user 生成公钥。

ssh-keygen -b 4096 Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: SHA256:zRhhVpEfSIZydqV75775sZB0GBjZ/f7nnZ4mgfYrWa8 hadoop@10-254-149-24 The key's randomart image is: +---[RSA 4096]----+ | ++=*+ . | | .o+o+o+. . | | +...o o .| | = .. o .| | S + oo.o | | +.=o .| | . +o+..| | o +.+O| | .EXO=| +----[SHA256]-----+
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ssh-keygen -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:zRhhVpEfSIZydqV75775sZB0GBjZ/f7nnZ4mgfYrWa8 hadoop@10-254-149-24
The key's randomart image is:
+---[RSA 4096]----+
|        ++=*+ .  |
|      .o+o+o+. . |
|       +...o o  .|
|         = .. o .|
|        S + oo.o |
|           +.=o .|
|          . +o+..|
|           o +.+O|
|            .EXO=|
+----[SHA256]-----+
 

输入以下命令将生成的公钥复制到三个节点上

ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@master ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@node1 ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@node2
1
2
3
4
ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@master
ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@node1
ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@node2
 

接下来可以用在 master 输入 ssh dc2-user@node1,ssh dc2-user@node2 来验证是否可以不输入密码就可以连接成功。

  • 配置 java 环境

在3台节点下载 jdk。

mkdir /home/dc2-user/java cd /home/dc2-user/java wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz tar -zxf jdk-8u191-linux-x64.tar.gz
1
2
3
4
5
mkdir /home/dc2-user/java
cd /home/dc2-user/java
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz
tar -zxf jdk-8u191-linux-x64.tar.gz
 

在3台节点配置 java 变量

sudo vi /etc/profile.d/jdk-1.8.sh export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
1
2
3
4
5
6
sudo vi /etc/profile.d/jdk-1.8.sh
export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
 

使环境变量生效

source /etc/profile
1
2
source /etc/profile
 

查看 java 版本

java -version java version "1.8.0_191" Java(TM) SE Runtime Environment (build 1.8.0_191-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
1
2
3
4
5
java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
 

出现以上结果,即说明 java 环境已经配置成功。

3.安装Hadoop

在 master 节点下载 Hadoop3.1.1 并解压。

cd /home/dc2-user wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz tar zxf hadoop-3.1.1.tar.gz
1
2
3
4
cd /home/dc2-user
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
tar zxf hadoop-3.1.1.tar.gz
 

在 /home/dc2-user/hadoop-3.1.1/etc/hadoop 下需要配置的6个文件分别是 hadoop-env.sh、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、workers

(1)hadoop-env.sh 添加如下内容

export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191 export HDFS_NAMENODE_USER="dc2-user" export HDFS_DATANODE_USER="dc2-user" export HDFS_SECONDARYNAMENODE_USER="dc2-user" export YARN_RESOURCEMANAGER_USER="dc2-user" export YARN_NODEMANAGER_USER="dc2-user"
1
2
3
4
5
6
7
export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191
export HDFS_NAMENODE_USER="dc2-user"
export HDFS_DATANODE_USER="dc2-user"
export HDFS_SECONDARYNAMENODE_USER="dc2-user"
export YARN_RESOURCEMANAGER_USER="dc2-user"
export YARN_NODEMANAGER_USER="dc2-user"
 

(2)core-site.xml

<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>
1
2
3
4
5
6
7
    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
        </property>
    </configuration>
 

(3)hdfs-site.xml

<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/dc2-user/data/nameNode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/dc2-user/data/dataNode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/dc2-user/data/nameNode</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/dc2-user/data/dataNode</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
       </property>
</configuration>
 

(4)yarn-site.xml

<configuration> <property> <name>yarn.acl.enable</name> <value>0</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<configuration>
    <property>
            <name>yarn.acl.enable</name>
            <value>0</value>
    </property>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>master</value>
    </property>
    <property>
          <name>yarn.resourcemanager.webapp.address</name>
          <value>master:8088</value>
    </property>
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
     <property>
    <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
 
</configuration>
 

(5)mapred-site.xml

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024M</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2560M</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
    </property>
 
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
 
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
</configuration>
 

(6)编辑 workers

node1 node2
1
2
3
node1
node2
 

4.启动Hadoop

  • 复制以下配置文件到 node1 和 node2

scp -r /home/dc2-user/hadoop-3.1.1 dc2-user@node1:/home/dc2-user/ scp -r /home/dc2-user/hadoop-3.1.1 dc2-user@node2:/home/dc2-user/
1
2
3
scp -r /home/dc2-user/hadoop-3.1.1 dc2-user@node1:/home/dc2-user/
scp -r /home/dc2-user/hadoop-3.1.1 dc2-user@node2:/home/dc2-user/
 

  • 配置 Hadoop 环境变量(三台节点)

sudo vi /etc/profile.d/hadoop-3.1.1.sh export HADOOP_HOME="/home/dc2-user/hadoop-3.1.1" export PATH="$HADOOP_HOME/bin:$PATH" export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
1
2
3
4
5
6
sudo vi /etc/profile.d/hadoop-3.1.1.sh
export HADOOP_HOME="/home/dc2-user/hadoop-3.1.1"
export PATH="$HADOOP_HOME/bin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
 

  • 使环境变量生效

source /etc/profile
1
2
source /etc/profile
 

  • 在3台节点输入 Hadoop version 看是否有输出,来验证环境变量是否生效

hadoop version Hadoop 3.1.1 Source code repository https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c Compiled by leftnoteasy on 2018-08-02T04:26Z Compiled with protoc 2.5.0 From source with checksum f76ac55e5b5ff0382a9f7df36a3ca5a0 This command was run using /home/dc2-user/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar
1
2
3
4
5
6
7
8
hadoop version
Hadoop 3.1.1
Source code repository https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c
Compiled by leftnoteasy on 2018-08-02T04:26Z
Compiled with protoc 2.5.0
From source with checksum f76ac55e5b5ff0382a9f7df36a3ca5a0
This command was run using /home/dc2-user/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar
 

  • 格式化 hdfs,只在 master 上操作

/home/dc2-user/hadoop-3.1.1/bin/hdfs namenode -format testCluster
1
2
/home/dc2-user/hadoop-3.1.1/bin/hdfs namenode -format testCluster
 

  • 开启服务

/home/dc2-user/hadoop-3.1.1/sbin/start-dfs.sh /home/dc2-user/hadoop-3.1.1/sbin/start-yarn.sh
1
2
3
/home/dc2-user/hadoop-3.1.1/sbin/start-dfs.sh
/home/dc2-user/hadoop-3.1.1/sbin/start-yarn.sh
 

  • 查看三个节点服务是否已启动

master

jps 1654 Jps 31882 NameNode 32410 ResourceManager 32127 SecondaryNameNode
1
2
3
4
5
6
jps
1654 Jps
31882 NameNode
32410 ResourceManager
32127 SecondaryNameNode
 

node1

jps 19827 NodeManager 19717 DataNode 20888 Jps
1
2
3
4
5
jps
19827 NodeManager
19717 DataNode
20888 Jps
 

node2

jps 30707 Jps 27675 NodeManager 27551 DataNode
1
2
3
4
5
jps
30707 Jps
27675 NodeManager
27551 DataNode
 

出现以上结果,即说明服务已经正常启动,可以通过 master 的公网 IP 访问 ResourceManager 的 Web 页面,注意要打开安全组的 8088 端口,关于滴滴云安全组的使用请参考以下链接。https://help.didiyun.com/hc/kb/article/1091031/

5.实例验证

最后用 Hadoop 中自带的 wordcount 程序来验证 MapReduce 功能,以下操作在 master 的节点进行
首先在当前目录创建两个文件 test1,test2,内容如下:

vi test1 hello world bye world vi test2 hello hadoop bye hadoop
1
2
3
4
5
6
7
vi test1
hello world
bye world
vi test2
hello hadoop
bye hadoop
 

接下来在 HDFS 中创建文件夹并将以上两个文件上传到文件夹中。

hadoop fs -mkdir /input hadoop fs -put test* /input
1
2
3
hadoop fs -mkdir /input
hadoop fs -put test* /input
 

当集群启动的时候,会首先进入安全模式,因此要先离开安全模式。

hdfs dfsadmin -safemode leave
1
2
hdfs dfsadmin -safemode leave
 

运行 wordcount 程序统计两个文件中个单词出现的次数。

yarn jar /home/dc2-user/hadoop-3.1.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /input /output WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. 2018-11-09 20:27:12,233 INFO client.RMProxy: Connecting to ResourceManager at master/10.254.149.24:8032 2018-11-09 20:27:12,953 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1541766351311_0001 2018-11-09 20:27:14,483 INFO input.FileInputFormat: Total input files to process : 2 2018-11-09 20:27:16,967 INFO mapreduce.JobSubmitter: number of splits:2 2018-11-09 20:27:17,014 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enab 2018-11-09 20:27:17,465 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541766351311_0001 2018-11-09 20:27:17,466 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2018-11-09 20:27:17,702 INFO conf.Configuration: resource-types.xml not found 2018-11-09 20:27:17,703 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2018-11-09 20:27:18,256 INFO impl.YarnClientImpl: Submitted application application_1541766351311_0001 2018-11-09 20:27:18,296 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1541766351311_0001/ 2018-11-09 20:27:18,297 INFO mapreduce.Job: Running job: job_1541766351311_0001 2018-11-09 20:28:24,929 INFO mapreduce.Job: Job job_1541766351311_0001 running in uber mode : false 2018-11-09 20:28:24,931 INFO mapreduce.Job: map 0% reduce 0% 2018-11-09 20:28:58,590 INFO mapreduce.Job: map 50% reduce 0% 2018-11-09 20:29:19,437 INFO mapreduce.Job: map 100% reduce 0% 2018-11-09 20:29:33,038 INFO mapreduce.Job: map 100% reduce 100% 2018-11-09 20:29:36,315 INFO mapreduce.Job: Job job_1541766351311_0001 completed successfully 2018-11-09 20:29:36,619 INFO mapreduce.Job: Counters: 54 File System Counters FILE: Number of bytes read=75 FILE: Number of bytes written=644561 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=237 HDFS: Number of bytes written=31 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Killed map tasks=1 Launched map tasks=3 Launched reduce tasks=1 Data-local map tasks=3 Total time spent by all maps in occupied slots (ms)=164368 Total time spent by all reduces in occupied slots (ms)=95475 Total time spent by all map tasks (ms)=82184 Total time spent by all reduce tasks (ms)=31825 Total vcore-milliseconds taken by all map tasks=82184 Total vcore-milliseconds taken by all reduce tasks=31825 Total megabyte-milliseconds taken by all map tasks=168312832 Total megabyte-milliseconds taken by all reduce tasks=97766400 Map-Reduce Framework Map input records=5 Map output records=8 Map output bytes=78 Map output materialized bytes=81 Input split bytes=190 Combine input records=8 Combine output records=6 Reduce input groups=4 Reduce shuffle bytes=81 Reduce input records=6 Reduce output records=4 Spilled Records=12 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=2230 CPU time spent (ms)=2280 Physical memory (bytes) snapshot=756064256 Virtual memory (bytes) snapshot=10772656128 Total committed heap usage (bytes)=541589504 Peak Map Physical memory (bytes)=281268224 Peak Map Virtual memory (bytes)=3033423872 Peak Reduce Physical memory (bytes)=199213056 Peak Reduce Virtual memory (bytes)=4708827136 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=47 File Output Format Counters Bytes Written=31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
yarn jar /home/dc2-user/hadoop-3.1.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /input /output
 
 
WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
2018-11-09 20:27:12,233 INFO client.RMProxy: Connecting to ResourceManager at master/10.254.149.24:8032
2018-11-09 20:27:12,953 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1541766351311_0001
2018-11-09 20:27:14,483 INFO input.FileInputFormat: Total input files to process : 2
2018-11-09 20:27:16,967 INFO mapreduce.JobSubmitter: number of splits:2
2018-11-09 20:27:17,014 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enab
2018-11-09 20:27:17,465 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541766351311_0001
2018-11-09 20:27:17,466 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-11-09 20:27:17,702 INFO conf.Configuration: resource-types.xml not found
2018-11-09 20:27:17,703 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-11-09 20:27:18,256 INFO impl.YarnClientImpl: Submitted application application_1541766351311_0001
2018-11-09 20:27:18,296 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1541766351311_0001/
2018-11-09 20:27:18,297 INFO mapreduce.Job: Running job: job_1541766351311_0001
2018-11-09 20:28:24,929 INFO mapreduce.Job: Job job_1541766351311_0001 running in uber mode : false
2018-11-09 20:28:24,931 INFO mapreduce.Job:  map 0% reduce 0%
2018-11-09 20:28:58,590 INFO mapreduce.Job:  map 50% reduce 0%
2018-11-09 20:29:19,437 INFO mapreduce.Job:  map 100% reduce 0%
2018-11-09 20:29:33,038 INFO mapreduce.Job:  map 100% reduce 100%
2018-11-09 20:29:36,315 INFO mapreduce.Job: Job job_1541766351311_0001 completed successfully
2018-11-09 20:29:36,619 INFO mapreduce.Job: Counters: 54
    File System Counters
        FILE: Number of bytes read=75
        FILE: Number of bytes written=644561
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=237
        HDFS: Number of bytes written=31
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Killed map tasks=1
        Launched map tasks=3
        Launched reduce tasks=1
        Data-local map tasks=3
        Total time spent by all maps in occupied slots (ms)=164368
        Total time spent by all reduces in occupied slots (ms)=95475
        Total time spent by all map tasks (ms)=82184
        Total time spent by all reduce tasks (ms)=31825
        Total vcore-milliseconds taken by all map tasks=82184
        Total vcore-milliseconds taken by all reduce tasks=31825
        Total megabyte-milliseconds taken by all map tasks=168312832
        Total megabyte-milliseconds taken by all reduce tasks=97766400
    Map-Reduce Framework
        Map input records=5
        Map output records=8
        Map output bytes=78
        Map output materialized bytes=81
        Input split bytes=190
        Combine input records=8
        Combine output records=6
        Reduce input groups=4
        Reduce shuffle bytes=81
        Reduce input records=6
        Reduce output records=4
        Spilled Records=12
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=2230
        CPU time spent (ms)=2280
        Physical memory (bytes) snapshot=756064256
        Virtual memory (bytes) snapshot=10772656128
        Total committed heap usage (bytes)=541589504
        Peak Map Physical memory (bytes)=281268224
        Peak Map Virtual memory (bytes)=3033423872
        Peak Reduce Physical memory (bytes)=199213056
        Peak Reduce Virtual memory (bytes)=4708827136
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=47
    File Output Format Counters
        Bytes Written=31
 

如果出现以下输出说明计算完成,结果保存在 HDFS 中的 /output 文件夹中。

hadoop fs -ls /output Found 2 items -rw-r--r-- 1 root supergroup 0 2018-11-09 20:29 /output/_SUCCESS -rw-r--r-- 1 root supergroup 31 2018-11-09 20:29 /output/part-r-00000
1
2
3
4
5
hadoop fs -ls /output
Found 2 items
-rw-r--r--   1 root supergroup          0 2018-11-09 20:29 /output/_SUCCESS
-rw-r--r--   1 root supergroup         31 2018-11-09 20:29 /output/part-r-00000
 

打开 part-r-00000 查看结果。

hadoop fs -cat /output/part-r-00000 bye 2 hadoop 2 hello 2 world 2
1
2
3
4
5
6
hadoop fs -cat /output/part-r-00000
bye 2
hadoop  2
hello   2
world   2
 

本文作者:贺子一