Hadoop学习笔记 - 07MapReduce ReduceTask源码解析本文简单介绍了ReduceTask的运行步

写在前面： 本文简单介绍了ReduceTask的运行步骤，重点介绍了ReduceTask是如何使用迭代器模式进行数据的读取，规避了优先内存处理大数据会出现OOM的问题。

源码分析

我们重写的Reducer类中写了这样三个步骤：

Shuffle：相同的key拉到一个分区
Sort：MapTask已经将不同的key的数据进行了排序，数据到了reduce过程中不需要再进行排序。这里的Sort其实是一次归并排序，将相同的key放在一起
Reduce：即reduce计算

首先从ReduceTask的run方法看起，以上的三个步骤就在该方法中体现。

  public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
    throws IOException, InterruptedException, ClassNotFoundException {
    job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());

    if (isMapOrReduce()) {
      // reduceTask工作分为拷贝、分组排序、reduce执行 三个阶段
      copyPhase = getProgress().addPhase("copy");
      sortPhase  = getProgress().addPhase("sort");
      reducePhase = getProgress().addPhase("reduce");
    }
    // start thread that will handle communication with parent
    TaskReporter reporter = startReporter(umbilical);
    
    boolean useNewApi = job.getUseNewReducer();
    initialize(job, getJobID(), reporter, useNewApi);

    // check if it is a cleanupJobTask
    if (jobCleanup) {
      runJobCleanupTask(umbilical, reporter);
      return;
    }
    if (jobSetup) {
      runJobSetupTask(umbilical, reporter);
      return;
    }
    if (taskCleanup) {
      runTaskCleanupTask(umbilical, reporter);
      return;
    }
    
    // Initialize the codec
    codec = initCodec();
    RawKeyValueIterator rIter = null;
    ShuffleConsumerPlugin shuffleConsumerPlugin = null;
    
    Class combinerClass = conf.getCombinerClass();
    CombineOutputCollector combineCollector = 
      (null != combinerClass) ? 
     new CombineOutputCollector(reduceCombineOutputCounter, reporter, conf) : null;

    Class<? extends ShuffleConsumerPlugin> clazz =
          job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);
					
    shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);
    LOG.info("Using ShuffleConsumerPlugin: " + shuffleConsumerPlugin);

    ShuffleConsumerPlugin.Context shuffleContext = 
      new ShuffleConsumerPlugin.Context(getTaskID(), job, FileSystem.getLocal(job), umbilical, 
                  super.lDirAlloc, reporter, codec, 
                  combinerClass, combineCollector, 
                  spilledRecordsCounter, reduceCombineInputCounter,
                  shuffledMapsCounter,
                  reduceShuffleBytes, failedShuffleCounter,
                  mergedMapOutputsCounter,
                  taskStatus, copyPhase, sortPhase, this,
                  mapOutputFile, localMapFiles);
    // shuffle的初始化过程
    shuffleConsumerPlugin.init(shuffleContext);
    // 进行shuffle的拉取map记录的操作，最终会返回迭代器。
    rIter = shuffleConsumerPlugin.run();

    // free up the data structures
    mapOutputFilesOnDisk.clear();
    
    sortPhase.complete();                         // sort is complete
    setPhase(TaskStatus.Phase.REDUCE); 
    statusUpdate(umbilical);
    Class keyClass = job.getMapOutputKeyClass();
    Class valueClass = job.getMapOutputValueClass();
    // 获取分组比较器
    RawComparator comparator = job.getOutputValueGroupingComparator();

    if (useNewApi) {
      runNewReducer(job, umbilical, reporter, rIter, comparator, 
                    keyClass, valueClass);
    } else {
      runOldReducer(job, umbilical, reporter, rIter, comparator, 
                    keyClass, valueClass);
    }

    shuffleConsumerPlugin.close();
    done(umbilical, reporter);
  }

首先是第一步，shuffle过程将相同的key拉到一个分区，拉取数据，最终返回迭代器rIter。由于大数据处理，不能将数据一次性加载到内存，所以通过迭代器一条一条将数据从磁盘中读取到内存更合适。

接着第二步，获取分组比较器RawComparator comparator = job.getOutputValueGroupingComparator()。

  public RawComparator getOutputValueGroupingComparator() {
    Class<? extends RawComparator> theClass = getClass(
      JobContext.GROUP_COMPARATOR_CLASS, null, RawComparator.class);
    if (theClass == null) {
      return getOutputKeyComparator();
    }
    
    return ReflectionUtils.newInstance(theClass, this);
  }

  public RawComparator getOutputKeyComparator() {
    Class<? extends RawComparator> theClass = getClass(
      JobContext.KEY_COMPARATOR, null, RawComparator.class);
    if (theClass != null)
      return ReflectionUtils.newInstance(theClass, this);
    // getMapOutputKeyClass 保底使用key自身的比较器
    return WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class), this);
  }

这里的getOutputKeyComparator在MapTask中也遇到过。该方法首先会获取用户自定义的分组排序器，如果分组排序器为空则调用排序比较器，若排序比较器也没设定则使用key自身的比较器。

分组排序器用来比较key是否相同，语义上返回的是boolean值。而排序比较器通常返回-1 0 1，语义上是小于、等于和大于。实际上这里的比较器返回的都是int类型，并且也是直接和0进行比较。一下这段代码在下文中会碰到。

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0, 
                                currentRawKey.getLength(),
                                nextKey.getData(),
                                nextKey.getPosition(),
                                nextKey.getLength() - nextKey.getPosition()
                                    ) == 0;

接着第三步，执行reduce方法。我们来看runNewReducer方法。

  private <INKEY,INVALUE,OUTKEY,OUTVALUE>
  void runNewReducer(JobConf job,
                     final TaskUmbilicalProtocol umbilical,
                     final TaskReporter reporter,
                     RawKeyValueIterator rIter,
                     RawComparator<INKEY> comparator,
                     Class<INKEY> keyClass,
                     Class<INVALUE> valueClass
                     ) throws IOException,InterruptedException, 
                              ClassNotFoundException {
    // wrap value iterator to report progress.
    final RawKeyValueIterator rawIter = rIter;
    rIter = new RawKeyValueIterator() {
      public void close() throws IOException {
        rawIter.close();
      }
      public DataInputBuffer getKey() throws IOException {
        return rawIter.getKey();
      }
      public Progress getProgress() {
        return rawIter.getProgress();
      }
      public DataInputBuffer getValue() throws IOException {
        return rawIter.getValue();
      }
      public boolean next() throws IOException {
        boolean ret = rawIter.next();
        reporter.setProgress(rawIter.getProgress().getProgress());
        return ret;
      }
    };
    // make a task context so we can get the classes
    org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
      new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job,
          getTaskID(), reporter);
    // make a reducer
    org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =
      (org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE>)
        ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
    // 写出reduce记录使用的Writer
    org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE> trackedRW = 
      new NewTrackingRecordWriter<OUTKEY, OUTVALUE>(this, taskContext);
    job.setBoolean("mapred.skip.on", isSkipping());
    job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
    // 创建reduce的上下文，设置reduce运行时的一些参数。
    org.apache.hadoop.mapreduce.Reducer.Context 
         reducerContext = createReduceContext(reducer, job, getTaskID(),
                                               rIter, reduceInputKeyCounter, 
                                               reduceInputValueCounter, 
                                               trackedRW,
                                               committer,
                                               reporter, comparator, keyClass,
                                               valueClass);
    try {
      reducer.run(reducerContext);
    } finally {
      trackedRW.close(reducerContext);
    }
  }

createReduceContext设置了reduce运行时的参数，并且将可以直接操作数据的迭代器rIter包装了起来。然后将该对象传给了reducer.run(reducerContext)。

  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }
    } finally {
      cleanup(context);
    }
  }

与Mapper中的context.nextKeyValue()判断是否还有下一条数据不同，Reducer的context.nextKey()是判断是否还有下一组数据。

  public boolean nextKey() throws IOException,InterruptedException {
    while (hasMore && nextKeyIsSame) { // nextKeyIsSame默认false
      nextKeyValue();
    }
    if (hasMore) {
      if (inputKeyCounter != null) {
        inputKeyCounter.increment(1);
      }
      return nextKeyValue();
    } else {
      return false;
    }
  }

首先会判断是否有下一条数据，并且判断是否是统一组key，如果是则进入循环。但是只要有数据最终都会执行nextKeyValue方法。

  public boolean nextKeyValue() throws IOException, InterruptedException {
    // 没有数据则返回false
    if (!hasMore) {
      key = null;
      value = null;
      return false;
    }
    // 判断是否为一组key的第一条数据
    firstValue = !nextKeyIsSame;
    // 获取key
    DataInputBuffer nextKey = input.getKey();
    currentRawKey.set(nextKey.getData(), nextKey.getPosition(), 
                      nextKey.getLength() - nextKey.getPosition());
    buffer.reset(currentRawKey.getBytes(), 0, currentRawKey.getLength());
    key = keyDeserializer.deserialize(key);
    // 获取value
    DataInputBuffer nextVal = input.getValue();
    buffer.reset(nextVal.getData(), nextVal.getPosition(), nextVal.getLength()
        - nextVal.getPosition());
    value = valueDeserializer.deserialize(value);

    currentKeyLength = nextKey.getLength() - nextKey.getPosition();
    currentValueLength = nextVal.getLength() - nextVal.getPosition();

    if (isMarked) {
      backupStore.write(nextKey, nextVal);
    }

    // 读完当前kv后继续读取下一条记录，返回是否有下一条记录
    hasMore = input.next();
    if (hasMore) {
      // 记录下一条记录的key
      nextKey = input.getKey();
      // 比较下一条记录是否为同一组key，并更新nextKeyIsSame
      nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0, 
                                     currentRawKey.getLength(),
                                     nextKey.getData(),
                                     nextKey.getPosition(),
                                     nextKey.getLength() - nextKey.getPosition()
                                         ) == 0;
    } else {
      nextKeyIsSame = false;
    }
    inputValueCounter.increment(1);
    return true;
  }

该方法在读写数据后会判断是否有下一条数据，如果有再判断下一条与当前数据是否为同一组key，并更新hasNore和nextKeyIsSame，用于在nextKey中继续判断。

接着看Reducer中的reduce方法，其中入参context.getValues()返回的是一个Iterable，对应的迭代器为ValueIterator。

  public 
  Iterable<VALUEIN> getValues() throws IOException, InterruptedException {
    return iterable;
  }

  protected class ValueIterable implements Iterable<VALUEIN> {
    private ValueIterator iterator = new ValueIterator();
    @Override
    public Iterator<VALUEIN> iterator() {
      return iterator;
    } 
  }

    @Override
    public boolean hasNext() {
      try {
        if (inReset && backupStore.hasNext()) {
          return true;
        } 
      } catch (Exception e) {
        e.printStackTrace();
        throw new RuntimeException("hasNext failed", e);
      }
      return firstValue || nextKeyIsSame;
    }

    @Override
    public VALUEIN next() {
      if (inReset) {
        try {
          if (backupStore.hasNext()) {
            backupStore.next();
            DataInputBuffer next = backupStore.nextValue();
            buffer.reset(next.getData(), next.getPosition(), next.getLength()
                - next.getPosition());
            value = valueDeserializer.deserialize(value);
            return value;
          } else {
            inReset = false;
            backupStore.exitResetMode();
            if (clearMarkFlag) {
              clearMarkFlag = false;
              isMarked = false;
            }
          }
        } catch (IOException e) {
          e.printStackTrace();
          throw new RuntimeException("next value iterator failed", e);
        }
      } 

      // if this is the first record, we don't need to advance
      if (firstValue) {
        firstValue = false;
        return value;
      }
      // if this isn't the first record and the next key is different, they
      // can't advance it here.
      if (!nextKeyIsSame) {
        throw new NoSuchElementException("iterate past last value");
      }
      // otherwise, go to the next key/value pair
      try {
        nextKeyValue();
        return value;
      } catch (IOException ie) {
        throw new RuntimeException("next value iterator failed", ie);
      } catch (InterruptedException ie) {
        // this is bad, but we can't modify the exception list of java.util
        throw new RuntimeException("next value iterator interrupted", ie);        
      }
    }

hasNext方法中认为只要是同组的第一条数据，或者下一条数据与本条数据同组，则存在下一条数据。

next方法中，如果是同组第一条数据则直接返回value。否则需要执行nextKeyValue方法，即使用ReduceContextImpl的input获取下一条数据，最终返回value。

总结一下

简单概括一下ReduceTask的工作流程：

ReduceTask将拉取的数据包装成迭代器
reduce方法被调用时，传递了values的迭代器，并没有将数据加载至内存
迭代器中的hasNext方法判断是否为一组数据中的第一条，或下一条数据是否为同组(nextKeyIsSame)
next方法将一组数据的第一条value直接返回，若不是则拿真正的迭代器获取记录，并更新nextKeyIsSame

可以看到，为了规避大数据量造成OOM的问题，充分利用了迭代器模式。再加上MapTask已经将数据排序，那么迭代器只需要一次I/O就可以线性处理完所有数据。