写在前面: 本文简单介绍了ReduceTask的运行步骤,重点介绍了ReduceTask是如何使用迭代器模式进行数据的读取,规避了优先内存处理大数据会出现OOM的问题。
源码分析
我们重写的Reducer类中写了这样三个步骤:
Shuffle:相同的key拉到一个分区Sort:MapTask已经将不同的key的数据进行了排序,数据到了reduce过程中不需要再进行排序。这里的Sort其实是一次归并排序,将相同的key放在一起Reduce:即reduce计算
首先从ReduceTask的run方法看起,以上的三个步骤就在该方法中体现。
public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
throws IOException, InterruptedException, ClassNotFoundException {
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
if (isMapOrReduce()) {
// reduceTask工作分为拷贝、分组排序、reduce执行 三个阶段
copyPhase = getProgress().addPhase("copy");
sortPhase = getProgress().addPhase("sort");
reducePhase = getProgress().addPhase("reduce");
}
// start thread that will handle communication with parent
TaskReporter reporter = startReporter(umbilical);
boolean useNewApi = job.getUseNewReducer();
initialize(job, getJobID(), reporter, useNewApi);
// check if it is a cleanupJobTask
if (jobCleanup) {
runJobCleanupTask(umbilical, reporter);
return;
}
if (jobSetup) {
runJobSetupTask(umbilical, reporter);
return;
}
if (taskCleanup) {
runTaskCleanupTask(umbilical, reporter);
return;
}
// Initialize the codec
codec = initCodec();
RawKeyValueIterator rIter = null;
ShuffleConsumerPlugin shuffleConsumerPlugin = null;
Class combinerClass = conf.getCombinerClass();
CombineOutputCollector combineCollector =
(null != combinerClass) ?
new CombineOutputCollector(reduceCombineOutputCounter, reporter, conf) : null;
Class<? extends ShuffleConsumerPlugin> clazz =
job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);
shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);
LOG.info("Using ShuffleConsumerPlugin: " + shuffleConsumerPlugin);
ShuffleConsumerPlugin.Context shuffleContext =
new ShuffleConsumerPlugin.Context(getTaskID(), job, FileSystem.getLocal(job), umbilical,
super.lDirAlloc, reporter, codec,
combinerClass, combineCollector,
spilledRecordsCounter, reduceCombineInputCounter,
shuffledMapsCounter,
reduceShuffleBytes, failedShuffleCounter,
mergedMapOutputsCounter,
taskStatus, copyPhase, sortPhase, this,
mapOutputFile, localMapFiles);
// shuffle的初始化过程
shuffleConsumerPlugin.init(shuffleContext);
// 进行shuffle的拉取map记录的操作,最终会返回迭代器。
rIter = shuffleConsumerPlugin.run();
// free up the data structures
mapOutputFilesOnDisk.clear();
sortPhase.complete(); // sort is complete
setPhase(TaskStatus.Phase.REDUCE);
statusUpdate(umbilical);
Class keyClass = job.getMapOutputKeyClass();
Class valueClass = job.getMapOutputValueClass();
// 获取分组比较器
RawComparator comparator = job.getOutputValueGroupingComparator();
if (useNewApi) {
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
} else {
runOldReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
}
shuffleConsumerPlugin.close();
done(umbilical, reporter);
}
首先是第一步,shuffle过程将相同的key拉到一个分区,拉取数据,最终返回迭代器rIter。由于大数据处理,不能将数据一次性加载到内存,所以通过迭代器一条一条将数据从磁盘中读取到内存更合适。
接着第二步,获取分组比较器RawComparator comparator = job.getOutputValueGroupingComparator()。
public RawComparator getOutputValueGroupingComparator() {
Class<? extends RawComparator> theClass = getClass(
JobContext.GROUP_COMPARATOR_CLASS, null, RawComparator.class);
if (theClass == null) {
return getOutputKeyComparator();
}
return ReflectionUtils.newInstance(theClass, this);
}
public RawComparator getOutputKeyComparator() {
Class<? extends RawComparator> theClass = getClass(
JobContext.KEY_COMPARATOR, null, RawComparator.class);
if (theClass != null)
return ReflectionUtils.newInstance(theClass, this);
// getMapOutputKeyClass 保底使用key自身的比较器
return WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class), this);
}
这里的getOutputKeyComparator在MapTask中也遇到过。该方法首先会获取用户自定义的分组排序器,如果分组排序器为空则调用排序比较器,若排序比较器也没设定则使用key自身的比较器。
分组排序器用来比较key是否相同,语义上返回的是boolean值。而排序比较器通常返回-1 0 1,语义上是小于、等于和大于。实际上这里的比较器返回的都是int类型,并且也是直接和0进行比较。一下这段代码在下文中会碰到。
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0,
currentRawKey.getLength(),
nextKey.getData(),
nextKey.getPosition(),
nextKey.getLength() - nextKey.getPosition()
) == 0;
接着第三步,执行reduce方法。我们来看runNewReducer方法。
private <INKEY,INVALUE,OUTKEY,OUTVALUE>
void runNewReducer(JobConf job,
final TaskUmbilicalProtocol umbilical,
final TaskReporter reporter,
RawKeyValueIterator rIter,
RawComparator<INKEY> comparator,
Class<INKEY> keyClass,
Class<INVALUE> valueClass
) throws IOException,InterruptedException,
ClassNotFoundException {
// wrap value iterator to report progress.
final RawKeyValueIterator rawIter = rIter;
rIter = new RawKeyValueIterator() {
public void close() throws IOException {
rawIter.close();
}
public DataInputBuffer getKey() throws IOException {
return rawIter.getKey();
}
public Progress getProgress() {
return rawIter.getProgress();
}
public DataInputBuffer getValue() throws IOException {
return rawIter.getValue();
}
public boolean next() throws IOException {
boolean ret = rawIter.next();
reporter.setProgress(rawIter.getProgress().getProgress());
return ret;
}
};
// make a task context so we can get the classes
org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job,
getTaskID(), reporter);
// make a reducer
org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =
(org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE>)
ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
// 写出reduce记录使用的Writer
org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE> trackedRW =
new NewTrackingRecordWriter<OUTKEY, OUTVALUE>(this, taskContext);
job.setBoolean("mapred.skip.on", isSkipping());
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
// 创建reduce的上下文,设置reduce运行时的一些参数。
org.apache.hadoop.mapreduce.Reducer.Context
reducerContext = createReduceContext(reducer, job, getTaskID(),
rIter, reduceInputKeyCounter,
reduceInputValueCounter,
trackedRW,
committer,
reporter, comparator, keyClass,
valueClass);
try {
reducer.run(reducerContext);
} finally {
trackedRW.close(reducerContext);
}
}
createReduceContext设置了reduce运行时的参数,并且将可以直接操作数据的迭代器rIter包装了起来。然后将该对象传给了reducer.run(reducerContext)。
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
// If a back up store is used, reset it
Iterator<VALUEIN> iter = context.getValues().iterator();
if(iter instanceof ReduceContext.ValueIterator) {
((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();
}
}
} finally {
cleanup(context);
}
}
与Mapper中的context.nextKeyValue()判断是否还有下一条数据不同,Reducer的context.nextKey()是判断是否还有下一组数据。
public boolean nextKey() throws IOException,InterruptedException {
while (hasMore && nextKeyIsSame) { // nextKeyIsSame默认false
nextKeyValue();
}
if (hasMore) {
if (inputKeyCounter != null) {
inputKeyCounter.increment(1);
}
return nextKeyValue();
} else {
return false;
}
}
首先会判断是否有下一条数据,并且判断是否是统一组key,如果是则进入循环。但是只要有数据最终都会执行nextKeyValue方法。
public boolean nextKeyValue() throws IOException, InterruptedException {
// 没有数据则返回false
if (!hasMore) {
key = null;
value = null;
return false;
}
// 判断是否为一组key的第一条数据
firstValue = !nextKeyIsSame;
// 获取key
DataInputBuffer nextKey = input.getKey();
currentRawKey.set(nextKey.getData(), nextKey.getPosition(),
nextKey.getLength() - nextKey.getPosition());
buffer.reset(currentRawKey.getBytes(), 0, currentRawKey.getLength());
key = keyDeserializer.deserialize(key);
// 获取value
DataInputBuffer nextVal = input.getValue();
buffer.reset(nextVal.getData(), nextVal.getPosition(), nextVal.getLength()
- nextVal.getPosition());
value = valueDeserializer.deserialize(value);
currentKeyLength = nextKey.getLength() - nextKey.getPosition();
currentValueLength = nextVal.getLength() - nextVal.getPosition();
if (isMarked) {
backupStore.write(nextKey, nextVal);
}
// 读完当前kv后继续读取下一条记录,返回是否有下一条记录
hasMore = input.next();
if (hasMore) {
// 记录下一条记录的key
nextKey = input.getKey();
// 比较下一条记录是否为同一组key,并更新nextKeyIsSame
nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0,
currentRawKey.getLength(),
nextKey.getData(),
nextKey.getPosition(),
nextKey.getLength() - nextKey.getPosition()
) == 0;
} else {
nextKeyIsSame = false;
}
inputValueCounter.increment(1);
return true;
}
该方法在读写数据后会判断是否有下一条数据,如果有再判断下一条与当前数据是否为同一组key,并更新hasNore和nextKeyIsSame,用于在nextKey中继续判断。
接着看Reducer中的reduce方法,其中入参context.getValues()返回的是一个Iterable,对应的迭代器为ValueIterator。
public
Iterable<VALUEIN> getValues() throws IOException, InterruptedException {
return iterable;
}
protected class ValueIterable implements Iterable<VALUEIN> {
private ValueIterator iterator = new ValueIterator();
@Override
public Iterator<VALUEIN> iterator() {
return iterator;
}
}
@Override
public boolean hasNext() {
try {
if (inReset && backupStore.hasNext()) {
return true;
}
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException("hasNext failed", e);
}
return firstValue || nextKeyIsSame;
}
@Override
public VALUEIN next() {
if (inReset) {
try {
if (backupStore.hasNext()) {
backupStore.next();
DataInputBuffer next = backupStore.nextValue();
buffer.reset(next.getData(), next.getPosition(), next.getLength()
- next.getPosition());
value = valueDeserializer.deserialize(value);
return value;
} else {
inReset = false;
backupStore.exitResetMode();
if (clearMarkFlag) {
clearMarkFlag = false;
isMarked = false;
}
}
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("next value iterator failed", e);
}
}
// if this is the first record, we don't need to advance
if (firstValue) {
firstValue = false;
return value;
}
// if this isn't the first record and the next key is different, they
// can't advance it here.
if (!nextKeyIsSame) {
throw new NoSuchElementException("iterate past last value");
}
// otherwise, go to the next key/value pair
try {
nextKeyValue();
return value;
} catch (IOException ie) {
throw new RuntimeException("next value iterator failed", ie);
} catch (InterruptedException ie) {
// this is bad, but we can't modify the exception list of java.util
throw new RuntimeException("next value iterator interrupted", ie);
}
}
hasNext方法中认为只要是同组的第一条数据,或者下一条数据与本条数据同组,则存在下一条数据。
next方法中,如果是同组第一条数据则直接返回value。否则需要执行nextKeyValue方法,即使用ReduceContextImpl的input获取下一条数据,最终返回value。
总结一下
简单概括一下ReduceTask的工作流程:
ReduceTask将拉取的数据包装成迭代器reduce方法被调用时,传递了values的迭代器,并没有将数据加载至内存- 迭代器中的
hasNext方法判断是否为一组数据中的第一条,或下一条数据是否为同组(nextKeyIsSame) next方法将一组数据的第一条value直接返回,若不是则拿真正的迭代器获取记录,并更新nextKeyIsSame
可以看到,为了规避大数据量造成OOM的问题,充分利用了迭代器模式。再加上MapTask已经将数据排序,那么迭代器只需要一次I/O就可以线性处理完所有数据。