1. WritableComparable 排序案例实操（全排序）

1.1. 需求

序列化案例产生的结果再次对总流量进行倒序排序。

输入数据原始数据
phone_data .txt
第一次处理后的数据
part-r-00000

期望输出数据

13509468723 7335 110349 117684  

13736230513 2481 24681 27162

13956435636 132 1512 1644

13846544121 264 0 264

2.2. 需求分析

需求：根据手机的总流量进行倒序排序

输入数据

13736230513 2481 24681 27162

13846544121 264 0 264

13956435636 132 1512 1644

13509468723 7335 110349 117684

输出数据

13509468723 7335 110349 117684
13736230513 2481 24681 27162
13956435636 132 1512 1644
13846544121 264 0 264

FlowBean实现WritableComparable接口重写compareTo方法

@Override
public int compareTo(FlowBean o) {
    // 倒序排列，按照总流量从大到小
    return this.sumFlow > o.getSumFlow() ? -1 : 1;
}

Mapper类

context.write(bean，手机号)

Reducer类

// 循环输出，避免总流量相同情况
for (Text text : values) {
    context.write(text, key);
}

2. WritableComparable排序案例实操（区内排序)

2.1. 需求

要求每个省份手机号输出的文件中按照总流量内部排序。

2.2. 需求分析

基于前一个需求，增加自定义分区类分区按照省份手机号设置

数据输入

13509468723 7335 110349 117684
13975057813 11058 48243 59301
13568436656 3597 25635 29232
13736230513 2481 24681 27162
18390173782 9531 2412 11943
13630577991 6960 690 7650
15043685818 3659 3538 7197
13992314666 3008 3720 6728
15910133277 3156 2936 6092
13560439638 918 4938 5856
84188413 4116 1432 5548
13682846555 1938 2910 4848
18271575951 1527 2106 3633
15959002129 1938 180 2118
13590439668 1116 954 2070
13956435636 132 1512 1644
13470253144 180 180 360
13846544121 264 0 264
13966251146 240 0 240
13768778790 120 120 240
13729199489 240 0 240

期望数据输出

2.3. 实操

增加自定义分区类

package com.atguigu.mapreduce.partitionercompable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class ProvincePartitioner2 extends Partitioner<FlowBean, Text> {
    @Override
    public int getPartition(FlowBean flowBean, Text text, int numPartitions)
    {
        //获取手机号前三位
        String phone = text.toString();
        String prePhone = phone.substring(0, 3);
        //定义一个分区号变量partition,根据prePhone 设置分区号
        int partition;
        if("136".equals(prePhone)){
            partition = 0;
        }else if("137".equals(prePhone)){
            partition = 1;
        }else if("138".equals(prePhone)){
            partition = 2;
        }else if("139".equals(prePhone)){
            partition = 3;
        }else {
            partition = 4;
        }
        //最后返回分区号partition
        return partition;
    }
}

在驱动类中添加分区类

// 设置自定义分区器
job.setPartitionerClass(ProvincePartitioner2.class);

// 设置对应的ReduceTask 的个数
job.setNumReduceTasks(5);

3. Combiner 合并

Combiner是MR程序中Mapper和Reducer之外的一种组件。
Combiner组件的父类就是Reducer。
Combiner和Reducer的区别在于运行的位置Combiner是在每一个MapTask所在的节点运行;
Combiner的意义就是对每一个MapTask的输出进行局部汇总，以减小网络传输量。
Combiner能够应用的前提是不能影响最终的业务逻辑，而且，Combiner的输出kv应该跟Reducer的输入kv类型要对应起来。

自定义Combiner 实现步骤

自定义一个Combiner 继承Reducer，重写Reduce 方法

public class WordCountCombiner extends Reducer<Text, IntWritable, Text,IntWritable> {
    private IntWritable outV = new IntWritable();
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
            sum += value.get();
            }
            outV.set(sum);
            context.write(key,outV);
        }
}

在Job 驱动类中设置：

job.setCombinerClass(WordCountCombiner.class);

Shuffle机制（二）

1. WritableComparable 排序案例实操（全排序）

1.1. 需求

2.2. 需求分析

2. WritableComparable排序 案例实操 （区内排序)

2.1. 需求

2.2. 需求 分析

2.3. 实操

3. Combiner 合并

2. WritableComparable排序案例实操（区内排序)

2.2. 需求分析