Ubuntu18.04 安装搭建 hadoop-3.3.0 集群
背景介绍
时间:2021-06。
公司想发展大数据方面,于是为就搭建一个简单的 hadoop 集群来测试使用,一共买了 3 台电脑,用两台电脑搭建一个简单的集群测试,电脑都是使用 Ubuntu18.04 系统。
该教程是在搭建 apache kylin 项目实验背景下安装 hadoop 集群,在安装 hadoop 之前需要先安装配置好 Java 环境。
集群分配
系统/版本说明
软件 | 版本 |
---|---|
系统 | ubuntu18.04 |
JDK | 1.8.0_291 |
hadoop | 3.3.0 |
集群分析表
在搭建集群之前需要将所有电脑接入同一个局域网中,在本次实验中只有两台电脑,所以为通过一根网线直连两台电脑。连接后需要设置两台电脑的 IP 地址,设置完成后可以通过 Ping 命令两台电脑能够互相 ping 通。
主节点 | 从节点 | |
---|---|---|
功能 | NameNode、DataNode | DataNode |
IP 地址 | 192.168.1.7 | 192.168.1.8 |
主机名 | user-ThinkServer-TS80X | watertek-thinkserver-ts80x |
域名 | master.watertek.com | slave1.watertek.com |
连网安装工具
Ubuntu 刚安装的系统时还缺少一些开发的环境,在配置之前电脑需要提前连网安装所需工具,并且两台电脑都需要安装。
工具名称 | 安装命令 | 功能作用 |
---|---|---|
vim | sudo apt-get install vim | 终端编辑器,Ubuntu 初始的 vim 编辑器使用起来会比较不习惯,所以安装一些该编辑器后使用起来会很方便 |
gcc,g++ | sudo apt-get install build-essential | C/C++ 编译器,编译程序使用 |
net-tools | sudo apt-get install net-tools | 网络工具,ifconfig,ping 等命令都会使用到 |
openssh-server | sudo apt-get install openssh-server | 终端 ssh 远程连接工具,可以在终端通过命令远程访问其他电脑,这样就可以在一台电脑上操作了。 |
网络配置
- 主机名可以修改,修改 /etc/hostname 文件下的名称即可。每台电脑的主机名修改为不一样的名称。
- 域名定义需要修改 /etc/hosts 文件,如下代码所示,新增了两行域名定义。需要注意的是不能放在最下边,从注释行开始往下是配置 ipv6 的,需要添加到 ipv6 注释之上。ip 和域名之间必须是一个 tab,且域名后不能有空格,否则是 ping 不通的。需要将 2 台电脑都必须配置相同才能相互 ping 通。
#127.0.0.1 localhost
#127.0.1.1 user-ThinkServer-TS80X
# 新增
192.168.1.7 master.watertek.com
192.168.1.8 slave1.watertek.com
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
- 配置完成后需要重启启动网络。
sudo /etc/init.d/networking restart
之后两台电脑就可以相互 ping 通了。
建立 ssh 无密码远程登陆
ssh 生成密钥有 rsa 和 dsa 两种生成方式,默认使用 rsa 方式。
- 在 hadoop 主机 master(192.168.1.7,以下直接使用 master)上采用 rsa 的方式创建 ssh-key。(P 是大写的,后面 “” 表示五密码)
ssh-keygen -t rsa -P ""
- 在 hadoop 从机 slave1 (192.168.1.8,以下直接使用 slave1) 上使用同样命令生成密钥。
- 通过步骤 1 命令后会在 ~/.ssh 目录下生成两个文件 id_rsa 和 id_rsa.pub,这两个文件是成对出现的,可以进入到 ~/.ssh 目录下查看。
- 将 master 上的 id_rsa.pub 传至 salve1 上。
// 进入 slave1 上 ~/.ssh 目录
cd ~/.ssh
// 传 id_rsa.pub 文件,由于 slave1 上已经存在 id_rsa.pub 文件,所以不能用相同的名称。
scp id_rsa.pub master.watertek.com:~/.ssh/slave1_id_rsa.pub
- 在 master 上将 id_rsa.pub 和 slave1_id_rsa.pub 追加到 authorized_keys 授权文件中,开始是没有 authorized_keys 文件的。
cat *.pub >> authorized_keys
- 将 master 上 authorized_keys 文件传输之 slave1 上。
scp authorized_keys slave1.watertek.com:~/.ssh
- 查看一下 authorized_keys 文件权限,-rw-rw-r-- 表示权限正常不用修改,如果不是则需要修改一下权限。
// 查看文件权限信息,下写的 l
ll
// 修改权限
sudo chmod 664 authorized_keys
- master 上 ssh 测试无密码登陆 slave1。
- slave1 上 ssh 测试无密码登陆 master.
- 如果无法登陆,查看 /home 下的用户权限是否是 751,最低权限 751,只能高不能低。
hadoop 安装配置步骤
在一些其他的安装教程中会创建一个 hadoop 用户,但是在本次实验中发现是可以不用创建的,当然创建用户也没啥影响。所以本次教程中就省略该步骤,需要创建的朋友可以看本文的参考博文。
下面的操作步骤在 master 上完成,master 上完成后再将配置好的 hadoop 拷贝之 slave1 上即可。
下载并解压
// 1. 将安装包解压至 /usr/local 目录下(安装包放在 /home/user/下载 目录下)
sudo tar -zxvf hadoop-3.3.0.tar.gz -C /usr/local/
// 2. 进入 /usr/local 目录下
cd /usr/local
// 3. 修改文件夹权限(user 为当前用户名)
sudo chown -R user:user hadoop-3.3.0
配置环境变量
// 1. 用 vim 打开并编辑 ~/.bashrc 文件
vi ~/.bashrc
// 2. Shift + G 光标跳转之文件末尾并添加 hadoop 环境变量
# hadoop
export HADOOP_HOME=/usr/local/hadoop-3.3.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CLASS_PATH=$HADOOP_CONF_DIR
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
// 3. 保存并退出编辑文件,运行 ~/.bashrc 使配置的环境变量生效
source ~/.bashrc
修改配置文件
以下需要修改文的文件在 hadoop-3.3.0/etc/hadoop 目录下。
- core-site.xml
- hadoop-env.sh
- hdfs-site.xml
- mapred-site.xml
- yarn-site.xml
- workers
// 进入 /usr/local/hadoop-3.3.0/etc/hadoop
cd /usr/local/hadoop-3.3.0/etc/hadoop
- 编辑 core-site.xml
vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master.watertek.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.3.0/hdfs/tmp</value>
</property>
</configuration>
- 编辑 hadoop-env.sh
vi hadoop-env.sh
# hadoop-env.sh 中配置 java 环境变量,需要绝对路径
export JAVA_HOME=/usr/local/jdk1.8.0_291
- 编辑 hdfs-site.xml
vi hdfs-site.xml
<configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value><!--master 和 slave1 都设置为 datanode 节点,所以为 2-->
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-3.3.0/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop-3.3.0/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1.watertek.com:9001</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master.watertek.com:50070</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>staff</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
- 编辑 mapred-site.xml
如果目录下没有 mapred-site.xml 而是 mapred-site.xml.template,那么需要先重命名或在复制一下该文件。
cp mapred-site.xml.template mapred-site.xml
或者
mv mapred-site.xml.tempalte mapred-site.xml
// 打开 mapred-site.xml
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 编辑 yarn-site.xml
vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master.watertek.com</value>
</property>
</configuration>
- 编辑 workers
在目录下应该没有 workers 文件,自己新建一个文件编辑即可。在 hadoop-3.0.0 以前使用 slaves 文件,hadoop-3.0.0 之后使用 workers 文件。
vi workers
master.watertek.com
slave1.watertek.com
- 将 master 上配置完成的 hadoop 传至 slave1 上。
// 进入 /usr/local 目录
cd /usr/local
/ / 传输,由于权限问题无法直接传至 slave1 上 /usr/local 目录下
scp -r hadoop-3.3.0 slave1.watertek.com:~/
// slave1 上移动 hadoop 至 /usr/local 目录下
sudo mv hadoop-3.3.0 /usr/local
// 修改文件夹权限,user 为 slave1 用户
sudo chown -R user:user hadoop-3.3.0
- 在 master 上格式化
hdfs namenode -format
- 启动 hadoop 集群
// 由于在开始配置 hadoop 环境变量,可以直接命令启动
start-all.sh
// 停止
stop-all.sh
// 或在先进入 /usr/local/hadoop-3.3.0 /sbin 目录下,在启动
cd /usr/local/hadoop-3.3.0/sbin
./start-all.sh
// 停止
./stop-all.sh
- 查看 hadoop 集群是否运行正常
// 使用 jps 命令查看 hadoop 进程
jps
注意事项
在配置完成后可能会遇到很多问题,需要正对问题逐个解决才行。
master 启动正常,slave1 上无法启动 datanode
描述:这个问题困扰了为好几天,参考好几篇博文反复配置 hadoop 各个配置文件好几遍。配置完成后 master 上各个进程能够正常启动运行,但是 slave1 上只能启动 SecondaryNameNode 进程,而 DataNode 和 NodeManager 进程没有启动。
原因:开始使用 /usr/local/hadoop-3.3.0/etc/hadoop 目录下 slaves 文件,从 hadoop-3.0.0 版本之后,配置文件 slaves 文件改为了 workers 文件。
解决方法:将 slaves 文件名改为 workers 即可,slave1 上也需要修改。
// 进入 /usr/local/hadoop-3.3.0/etc/hadoop 目录
cd /usr/local/hadoop-3.3.0/etc/hadoop
// 修改文件名
mv slaves workers
master 上 datanode 无法启动
描述:刚配置完成 hadoop 时 master 上都运行正常,重新格式化之后 datanode 无法启动了。
原因:重新格式化之后版本号改变导致无法启动 datanode。
解决方法:有两种解决方法,一是删除 /usr/local/hadoop-3.3.0/hdfs 文件夹,然后重新格式化后启动。而是修改版本号。
- /usr/local/hadoop-3.3.0 目录下 hdfs 文件夹是配置文件中建立的,直接将整个文件夹删除,在使用 hdfs namenode -format 命令格式化后启动就正常了。
- /usr/local/hadoop-3.3.0/etc/hadoop/hdfs/name/current/VERSION 文件中和 /usr/local/hadoop-3.3.0/etc/hadoop/hdfs/data/current/VERSION 中 clusterID 相同。
运行时报错
描述:在测试示例程序时运行报错,错误信息如下;
2021-06-23 11:52:11,747 INFO mapreduce.Job: Job job_1624343986613_0002 failed with state FAILED due to: Application application_1624343986613_0002 failed 2 times due to AM Container for appattempt_1624343986613_0002_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2021-06-23 11:52:11.139]Exception from container-launch.
Container id: container_1624343986613_0002_02_000001
Exit code: 1
[2021-06-23 11:52:11.141]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[2021-06-23 11:52:11.141]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
For more detailed output, check the application tracking page: http://master.watertek.com:8088/cluster/app/application_1624343986613_0002 Then click on links to logs of each attempt.
. Failing the application.
2021-06-23 11:52:11,765 INFO mapreduce.Job: Counters: 0
解决方法:
编辑 /usr/local/hadoop-3.3.0/etc/hadoop 目录下 yarn-site.xml 配置文件并添加 yarn.application.classpath 属性和对应值。使用 hadoop classpath 可以查看输出的值。
// 命令
hadoop classpath
// 输出信息
/usr/local/hadoop-3.3.0/etc/hadoop:/usr/local/hadoop-3.3.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/common/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.3.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn:/usr/local/hadoop-3.3.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn/*
<!--编辑 yarn-site.xml,添加一个属性-->
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/hadoop-3.3.0/etc/hadoop:/usr/local/hadoop-3.3.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/common/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.3.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn:/usr/local/hadoop-3.3.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn/*</value>
</property>
测试 hadoop 集群
测试示例 1
hadoop jar /usr/local/hadoop-3.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2021-06-23 14:27:35,336 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master.watertek.com/192.168.1.7:8032
2021-06-23 14:27:35,641 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/user/.staging/job_1624428808391_0002
2021-06-23 14:27:35,819 INFO input.FileInputFormat: Total input files to process : 10
2021-06-23 14:27:35,999 INFO mapreduce.JobSubmitter: number of splits:10
2021-06-23 14:27:36,189 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624428808391_0002
2021-06-23 14:27:36,189 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-06-23 14:27:36,312 INFO conf.Configuration: resource-types.xml not found
2021-06-23 14:27:36,313 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-06-23 14:27:36,366 INFO impl.YarnClientImpl: Submitted application application_1624428808391_0002
2021-06-23 14:27:36,401 INFO mapreduce.Job: The url to track the job: http://master.watertek.com:8088/proxy/application_1624428808391_0002/
2021-06-23 14:27:36,401 INFO mapreduce.Job: Running job: job_1624428808391_0002
2021-06-23 14:27:41,490 INFO mapreduce.Job: Job job_1624428808391_0002 running in uber mode : false
2021-06-23 14:27:41,492 INFO mapreduce.Job: map 0% reduce 0%
2021-06-23 14:27:46,627 INFO mapreduce.Job: map 20% reduce 0%
2021-06-23 14:27:54,701 INFO mapreduce.Job: map 100% reduce 0%
2021-06-23 14:27:55,712 INFO mapreduce.Job: map 100% reduce 100%
2021-06-23 14:27:56,735 INFO mapreduce.Job: Job job_1624428808391_0002 completed successfully
2021-06-23 14:27:56,820 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=2913955
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2730
HDFS: Number of bytes written=215
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=89286
Total time spent by all reduces in occupied slots (ms)=7107
Total time spent by all map tasks (ms)=89286
Total time spent by all reduce tasks (ms)=7107
Total vcore-milliseconds taken by all map tasks=89286
Total vcore-milliseconds taken by all reduce tasks=7107
Total megabyte-milliseconds taken by all map tasks=91428864
Total megabyte-milliseconds taken by all reduce tasks=7277568
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1550
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=2898
CPU time spent (ms)=6310
Physical memory (bytes) snapshot=3237498880
Virtual memory (bytes) snapshot=29165371392
Total committed heap usage (bytes)=3401056256
Peak Map Physical memory (bytes)=333475840
Peak Map Virtual memory (bytes)=2654306304
Peak Reduce Physical memory (bytes)=230047744
Peak Reduce Virtual memory (bytes)=2658615296
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 21.584 seconds
Estimated value of Pi is 3.20000000000000000000
可以看出最后输出结果为:Estimated value of Pi is 3.20000000000000000000,说明运行成功了。
测试示例 2
- hadoop 自带了 wordcount 例子,是用于统计单词个数的,首先要在 hdfs 系统中创建文件夹,要查看 hdfs 系统可以通过 hadoop fs -ls 查看 hdfs 系统的文件以及目录情况,图中 word_count_input 和 word_count_output 为我测试示例使用,如下图:
- 使用 hadoop fs -rm -r word_count_input word_count_output 删除 hdfs 系统中之前测试的结果。
- 使用 hadoop fs -mkdir word_cout_input 命令在 hdfs 系统创建文件夹。
- 在本地创建两个文件,示例中在 /home/user 目录下创建一个文件夹 word_count_test,在 /home/user/word_count_test 目录下创建两个文件 file1.txt 和 file2.txt。
vi file1.txt
vi file2.txt
// file1.txt 内容
hello hadoop
hello hive
hello ljj
hello hadoop
good morning
hello hbase
hello hadoop
// file2.txt 内容
linux window
hello linux
hello window
根据文本中内容可以统计出 hello 8, hadoop 3, hive 1, ljj 1, good 1, morning 1, hbase 1, linux 2, window 2。
- 将本地文件上传至 hdfs 系统的 word_count_input 文件夹下。
hadoop fs -put *.txt word_count_input
- 使用命令运行 wordcount,其中 word_count_output 为结果输出目录。
hadoop jar /usr/local/hadoop-3.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount word_count_input word_count_output
运行输出信息:
2021-06-23 14:52:51,680 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master.watertek.com/192.168.1.7:8032
2021-06-23 14:52:52,313 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/user/.staging/job_1624428808391_0003
2021-06-23 14:52:52,622 INFO input.FileInputFormat: Total input files to process : 2
2021-06-23 14:52:52,819 INFO mapreduce.JobSubmitter: number of splits:2
2021-06-23 14:52:52,991 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624428808391_0003
2021-06-23 14:52:52,991 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-06-23 14:52:53,132 INFO conf.Configuration: resource-types.xml not found
2021-06-23 14:52:53,132 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-06-23 14:52:53,182 INFO impl.YarnClientImpl: Submitted application application_1624428808391_0003
2021-06-23 14:52:53,214 INFO mapreduce.Job: The url to track the job: http://master.watertek.com:8088/proxy/application_1624428808391_0003/
2021-06-23 14:52:53,215 INFO mapreduce.Job: Running job: job_1624428808391_0003
2021-06-23 14:52:59,382 INFO mapreduce.Job: Job job_1624428808391_0003 running in uber mode : false
2021-06-23 14:52:59,383 INFO mapreduce.Job: map 0% reduce 0%
2021-06-23 14:53:03,458 INFO mapreduce.Job: map 50% reduce 0%
2021-06-23 14:53:04,469 INFO mapreduce.Job: map 100% reduce 0%
2021-06-23 14:53:09,504 INFO mapreduce.Job: map 100% reduce 100%
2021-06-23 14:53:09,516 INFO mapreduce.Job: Job job_1624428808391_0003 completed successfully
2021-06-23 14:53:09,593 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=126
FILE: Number of bytes written=793862
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=389
HDFS: Number of bytes written=72
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=5645
Total time spent by all reduces in occupied slots (ms)=3123
Total time spent by all map tasks (ms)=5645
Total time spent by all reduce tasks (ms)=3123
Total vcore-milliseconds taken by all map tasks=5645
Total vcore-milliseconds taken by all reduce tasks=3123
Total megabyte-milliseconds taken by all map tasks=5780480
Total megabyte-milliseconds taken by all reduce tasks=3197952
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=203
Map output materialized bytes=132
Input split bytes=266
Combine input records=20
Combine output records=10
Reduce input groups=9
Reduce shuffle bytes=132
Reduce input records=10
Reduce output records=9
Spilled Records=20
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=222
CPU time spent (ms)=1800
Physical memory (bytes) snapshot=747245568
Virtual memory (bytes) snapshot=7958106112
Total committed heap usage (bytes)=799539200
Peak Map Physical memory (bytes)=288419840
Peak Map Virtual memory (bytes)=2650775552
Peak Reduce Physical memory (bytes)=216190976
Peak Reduce Virtual memory (bytes)=2657652736
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=123
File Output Format Counters
Bytes Written=72
计算结果:
通过命令可以查看目录下多出 word_count_output 文件夹,文件夹下有两个文件,其中 _SUCCESS 文件为空,表示计算成功。 part-r-00000 文件保存了计算的结果。