hibench简介和安装使用本文已参与「新人创作礼」活动，一起开启掘金创作之路。 hibench是啥 HiBench 基

本文已参与「新人创作礼」活动，一起开启掘金创作之路。

hibench是啥

HiBench 基准测试套件由一组 Hadoop 程序组成，可帮助评估 Hadoop 框架的速度、吞吐量、HDFS 带宽、系统资源利用率和数据访问模式。

HiBench 套件包括以下用于应用程序压力测试的工作负载：

种类
字数
兆排序
Nutch 分度
网页排名
贝叶斯分类
K-means 聚类
增强的 DFSIO

IBM® Spectrum Symphony中的 MapReduce 框架符合 HiBench 2.0 和 Hadoop 0.20.203 的要求。有关配置IBM Spectrum Symphony以在您的环境中运行测试的信息，请联系您的营销代表。

1. Setup

Python 2.x(>=2.6) is required.
bc is required to generate the HiBench report.
Supported Hadoop version: Apache Hadoop 2.x, CDH5.x, HDP
Build HiBench according to build HiBench.
Start HDFS, Yarn in the cluster.

2. Configure `hadoop.conf`

Create and edit conf/hadoop.conf：

cp conf/hadoop.conf.template conf/hadoop.conf

Set the below properties properly:

Property	Meaning
hibench.hadoop.home	The Hadoop installation location
hibench.hadoop.executable	The path of hadoop executable. For Apache Hadoop, it is /YOUR/HADOOP/HOME/bin/hadoop
hibench.hadoop.configure.dir	Hadoop configuration directory. For Apache Hadoop, it is /YOUR/HADOOP/HOME/etc/hadoop
hibench.hdfs.master	The root HDFS path to store HiBench data, i.e. hdfs://localhost:8020/user/username
hibench.hadoop.release	Hadoop release provider. Supported value: apache

Note: For CDH and HDP users, please update hibench.hadoop.executable, hibench.hadoop.configure.dir and hibench.hadoop.release properly. The default value is for Apache release.

3. Run a workload

To run a single workload i.e. wordcount.

 bin/workloads/micro/wordcount/prepare/prepare.sh
 bin/workloads/micro/wordcount/hadoop/run.sh

The prepare.sh launches a Hadoop job to generate the input data on HDFS. The run.sh submits a Hadoop job to the cluster. bin/run_all.sh can be used to run all workloads listed in conf/benchmarks.lst and conf/frameworks.lst.

4. View the report

The <HiBench_Root>/report/hibench.report is a summarized workload report, including workload name, execution duration, data size, throughput per cluster, throughput per node.

The report directory also includes further information for debugging and tuning.

<workload>/hadoop/bench.log: Raw logs on client side.
<workload>/hadoop/monitor.html: System utilization monitor results.
<workload>/hadoop/conf/<workload>.conf: Generated environment variable configurations for this workload.

5. Input data size

To change the input data size, you can set hibench.scale.profile in conf/hibench.conf. Available values are tiny, small, large, huge, gigantic and bigdata. The definition of these profiles can be found in the workload's conf file i.e. conf/workloads/micro/wordcount.conf

6. Tuning

Change the below properties in conf/hibench.conf to control the parallelism.

Property	Meaning
hibench.default.map.parallelism	Mapper number in hadoop
hibench.default.shuffle.parallelism	Reducer number in hadoop

hibench简介和安装使用