阅读 65

mapreduce-3.client

核心类

image.png

org.apache.hadoop.mapreduce.Job

The job submitter's view of the Job.

It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.

Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress.

Here is an example on how to submit a job:

   // Create a new Job
   Job job = Job.getInstance();
   job.setJarByClass(MyJob.class);
   
   // Specify various job-specific parameters     
   job.setJobName("myjob");
   
   job.setInputPath(new Path("in"));
   job.setOutputPath(new Path("out"));
   
   job.setMapperClass(MyJob.MyMapper.class);
   job.setReducerClass(MyJob.MyReducer.class);

   // Submit the job, then poll for progress until the job is complete
   job.waitForCompletion(true);
复制代码

org.apache.hadoop.mapreduce.JobSubmitter

function: JobStatus submitJobInternal(Job job, Cluster cluster)
Internal method for submitting jobs to the system.
  The job submission process involves:
    1. Checking the input and output specifications of the job.
    2. Computing the InputSplits for the job.
    3. Setup the requisite accounting information for the DistributedCache of the job, if necessary.
    4. Copying the job's jar and configuration to the map-reduce system directory on the distributed file-system.
    5. Submitting the job to the JobTracker and optionally monitoring it's status.
  Params:
    job – the configuration to submit
    cluster – the handle to the Cluster
复制代码

核心算法

时序图

image.png

MapReduce切片机制

FileInputFormat的getSplits

  • 参数:

dfs.blocksize mapreduce.input.fileinputformat.split.minsize 默认1 mapreduce.input.fileinputformat.split.maxsize 默认LONG.MAX_VALUE

  • 算法

若maxSize小于blockSize(min<max<block),则按照maxSize切分文件(一个block切分成多个split);

若minSize大于blockSize(block<min<max),则按照minSize切分文件(多个block组成一个split);

否则(min<block<max),按照block切分文件

  • 代码
public List<InputSplit> getSplits(JobContext job) throws IOException {
    StopWatch sw = new StopWatch().start();
    long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));//mapreduce.input.fileinputformat.split.minsize最小为1
    long maxSize = getMaxSplitSize(job);//mapreduce.input.fileinputformat.split.maxsize最大为Long.MAX_VALUE

    // generate splits
    List<InputSplit> splits = new ArrayList<InputSplit>();
    List<FileStatus> files = listStatus(job);//罗列job输入文件

    boolean ignoreDirs = !getInputDirRecursive(job)
      && job.getConfiguration().getBoolean(INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, false);
    for (FileStatus file: files) {//循环遍历输入文件,依次计算分片信息
      if (ignoreDirs && file.isDirectory()) {
        continue;
      }
      Path path = file.getPath();
      long length = file.getLen();
      if (length != 0) {
        BlockLocation[] blkLocations;
        if (file instanceof LocatedFileStatus) {
          blkLocations = ((LocatedFileStatus) file).getBlockLocations();//获取文件块位置信息,例如
                                           //BlockLocation(offset: 0, length: BLOCK_SIZE,hosts: {"host1:9866", "host2:9866, host3:9866"})
        } else {
          FileSystem fs = path.getFileSystem(job.getConfiguration());
          blkLocations = fs.getFileBlockLocations(file, 0, length);
        }
        if (isSplitable(job, path)) {//Is the given filename splittable? Usually, true, but if the file is stream compressed, it will not be.
          long blockSize = file.getBlockSize();//默认128M
          long splitSize = computeSplitSize(blockSize, minSize, maxSize);//Math.max(minSize, Math.min(maxSize, blockSize));通常等于blockSize
          //若mapreduce.input.fileinputformat.split.maxsize < blocksize,则为用户设置的maxsize
          //或blocksize < mapreduce.input.fileinputformat.split.minsize,则为用户设置的minsize

          long bytesRemaining = length;
          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {//SPLIT_SLOP等于1.1,若剩余文件大小大于分片1.1则分成两片
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, splitSize,
                        blkLocations[blkIndex].getHosts(),
                        blkLocations[blkIndex].getCachedHosts()));
            bytesRemaining -= splitSize;
          }

          if (bytesRemaining != 0) {
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,
                       blkLocations[blkIndex].getHosts(),
                       blkLocations[blkIndex].getCachedHosts()));//InputSplit包含属性 FileSplit(file, start, length, hosts, inMemoryHosts)
          }
        } else { // not splitable
          if (LOG.isDebugEnabled()) {
            // Log only if the file is big enough to be splitted
            if (length > Math.min(file.getBlockSize(), minSize)) {
              LOG.debug("File is not splittable so no parallelization "
                  + "is possible: " + file.getPath());
            }
          }
          splits.add(makeSplit(path, 0, length, blkLocations[0].getHosts(),
                      blkLocations[0].getCachedHosts()));
        }
      } else { 
        //Create empty hosts array for zero length files
        splits.add(makeSplit(path, 0, length, new String[0]));
      }
    }
    // Save the number of input files for metrics/loadgen
    job.getConfiguration().setLong(NUM_INPUT_FILES, files.size());
    sw.stop();
    if (LOG.isDebugEnabled()) {
      LOG.debug("Total # of splits generated by getSplits: " + splits.size()
          + ", TimeTaken: " + sw.now(TimeUnit.MILLISECONDS));
    }
    return splits;
  }
复制代码
  • 1.1倍:

最常见的问题就是:一个大小为130M的文件,在分片大小为128M的集群上会分成几片?

答案是1片;因为 128x1.1>130,准确来说应该是130/128 < 1.1 (源码的公式)。也就是说,如果剩下的文件大小在分片大小的1.1倍以内,就不会再分片了。要这个1.1倍,是为了优化性能;试想如果不这样,当还剩下130M大小的时候,就会分成一块128M,一块2M,后面还要为这个2M的块单独开一个map任务,不划算。至于为什么是1.1,这个1.1是专家们通过反复试验得出来的结果。

org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat

If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack.

If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits.

If the maxSplitSize is equal to the block size, then this class is similar to the default splitting behavior in Hadoop: each block is a locally processed split.

参数

mapreduce.input.fileinputformat.split.maxsize 默认0 mapreduce.input.fileinputformat.split.minsize.per.node 默认0 mapreduce.input.fileinputformat.split.minsize.per.rack 默认0

算法

(1)不断迭代节点列表,逐个节点(OneBlockInfo)形成切片(Local Split);对于每一个节点而言,遍历并累加这个节点上的数据块,

  1. 如果maxSplitSize != 0, 累加OneBlockInfo大小,若已经大于或等于maxSize,则将这些OneBlockInfo形成一个切片;剩余的OneBlockInfo进入下一个node计算;

  2. 如果minSizeNode != 0, 并且累加的OneBlockInfo大小未已经大于minSizeNode并且该node上还没有切片,则将node上的所有OneBlockInfo形成一个切片

  3. 否则,node上的所有OneBlockInfo进入下一个node计算

(2)不断迭代机架列表,逐个机架(第一步剩余的OneBlockInfo)形成切片(Rack Split)

  1. 如果maxSplitSize != 0, 累加OneBlockInfo大小,若已经大于或等于maxSize,则将这些OneBlockInfo形成一个切片;剩余的OneBlockInfo进入下一个rack计算;
  2. 如果minSizeRack != 0, 并且累加的OneBlockInfo大小未已经大于minSizeRack并且a步骤上还没有切片,则将rack上的现存所有OneBlockInfo形成一个切片
  3. 否则,rack上的所有OneBlockInfo进入下一个node计算

(3)遍历并累加剩余数据块,如果maxSplitSize != 0且累积的数据块大小大于或等于maxSplitSize,则将这些数据块形成一个切片;

(4)剩余数据块形成一个切片。

核心逻辑比较简单:优先将一个节点上的数据块形成切片(同时兼顾切片分布问题),次之将一个机架的数据块形成切片,最后将剩余数据块形成切片。

代码
  1. getSplits()
// all the files in input set
List<FileStatus> stats = listStatus(job);//获取所有输入文件
List<InputSplit> splits = new ArrayList<InputSplit>();
getMoreSplits(job, stats, maxSize, minSizeNode, minSizeRack, splits);//返回指定路径列表的split
复制代码
  1. getMoreSplits()
// populate all the blocks for all files
long totLength = 0;
int i = 0;
for (FileStatus stat : stats) {
  files[i] = new OneFileInfo(stat, conf, isSplitable(job, stat.getPath()),
                             rackToBlocks, blockToNodes, nodeToBlocks,
                             rackToNodes, maxSize);//每个文件生成一个OneFileInfo对象,每个OneFileInfo对象包含多个OneBlockInfo对象
  totLength += files[i].getLength();
}
createSplits(nodeToBlocks, blockToNodes, rackToBlocks, totLength, 
             maxSize, minSizeNode, minSizeRack, splits);
复制代码

3.1 OneBlockInfo生成逻辑

  for (int i = 0; i < locations.length; i++) {//1. 遍历文件的所有块
    fileSize += locations[i].getLength();

    // each split can be a maximum of maxSize
    long left = locations[i].getLength();
    long myOffset = locations[i].getOffset();
    long myLength = 0;
    do {
      if (maxSize == 0) {// 2. 如果maxSplitSize设置为0,则一个块生成一个OneBlockInfo对象
        myLength = left;
      } else {
        if (left > maxSize && left < 2 * maxSize) {//3. 如果maxSplitSize不为0,
                                            //如果块剩余大小大于2倍的maxSplitSize,则先构建一个maxSplitSize的OneBlockInfo
                                            //如果块剩余大小大于maxSplitSize且小于2倍的maxSplitSize,则先构建一个大小为二分之一的maxSplitSize的OneBlockInfo
                                            //如果块剩余大小小于maxSplitSize,则构建一个left大小的OneBlockInfo
          // if remainder is between max and 2*max - then
          // instead of creating splits of size max, left-max we
          // create splits of size left/2 and left/2. This is
          // a heuristic to avoid creating really really small
          // splits.
          myLength = left / 2;
        } else {
          myLength = Math.min(maxSize, left);
        }
      }
      OneBlockInfo oneblock = new OneBlockInfo(stat.getPath(),
          myOffset, myLength, locations[i].getHosts(),
          locations[i].getTopologyPaths());
      left -= myLength;
      myOffset += myLength;

      blocksList.add(oneblock);
    } while (left > 0);
  }
复制代码

3.2 createSplits() Process all the nodes and create splits that are local to a node.

Generate one split per node iteration, and walk over nodes multiple times to distribute the splits across nodes.

Note: The order of processing the nodes is undetermined because the implementation of nodeToBlocks is HashMap and its order of the entries is undetermined.

while(true) {
  for (Iterator<Map.Entry<String, Set<OneBlockInfo>>> iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) {//1. 遍历node
    Map.Entry<String, Set<OneBlockInfo>> one = iter.next();
    String node = one.getKey();
    Set<OneBlockInfo> blocksInCurrentNode = one.getValue();
    Iterator<OneBlockInfo> oneBlockIter = blocksInCurrentNode.iterator();//2. 遍历当前节点的所有OneBlockInfo对象
    while (oneBlockIter.hasNext()) {
      OneBlockInfo oneblock = oneBlockIter.next();
      validBlocks.add(oneblock);
      blockToNodes.remove(oneblock);
      curSplitSize += oneblock.length;

      // if the accumulated split size exceeds the maximum, then
      // create this split.
      if (maxSize != 0 && curSplitSize >= maxSize) {//2.1 如果maxSplitSize不为0且当前node累计OneBlockInfo的size大于maxSplitSize,则生成一个CombineFileSplit
                                                //此步骤会将一个node上的多个小文件合并放入同一个CombineFileSplit
        // create an input split and add it to the splits array
        addCreatedSplit(splits, Collections.singleton(node), validBlocks);
        totalLength -= curSplitSize;
        curSplitSize = 0;

        splitsPerNode.add(node);

        // Remove entries from blocksInNode so that we don't walk these
        // again.
        blocksInCurrentNode.removeAll(validBlocks);
        validBlocks.clear();

        // Done creating a single split for this node. Move on to the next
        // node so that splits are distributed across nodes.
        break;//3.2 若该node上还有剩余的OneBlockInfo则进入下一个计算
      }
    }
 if (validBlocks.size() != 0) {// 2.2 如果minSizeNode不等于0,而且当前node还未生成Split且满足当前node的所有OneBlockInfo的size总和大于minSizeNode,则将其放入一个Split
  if (minSizeNode != 0 && curSplitSize >= minSizeNode
      && splitsPerNode.count(node) == 0) {
    addCreatedSplit(splits, Collections.singleton(node), validBlocks);
    totalLength -= curSplitSize;
    splitsPerNode.add(node);
    // Remove entries from blocksInNode so that we don't walk this again.
    blocksInCurrentNode.removeAll(validBlocks);
    // The node is done. This was the last set of blocks for this node.
  } else {//4.2 如果maxSplitSize未设置或者当前node在之前已经生成了split,则将剩余的OneBlockInfo进入下一个node计算
    // Put the unplaced blocks back into the pool for later rack-allocation.
    for (OneBlockInfo oneblock : validBlocks) {
      blockToNodes.put(oneblock, oneblock.hosts);
    }
  }
  validBlocks.clear();
  curSplitSize = 0;
  completedNodes.add(node);
}


  for (Iterator<Map.Entry<String, List<OneBlockInfo>>> iter = rackToBlocks.entrySet().iterator(); iter.hasNext();) {//3. 遍历rack

    Map.Entry<String, List<OneBlockInfo>> one = iter.next();
    racks.add(one.getKey());
    List<OneBlockInfo> blocks = one.getValue();

    // for each block, copy it into validBlocks. Delete it from 
    // blockToNodes so that the same block does not appear in 
    // two different splits.
    boolean createdSplit = false;
    for (OneBlockInfo oneblock : blocks) {
      if (blockToNodes.containsKey(oneblock)) {
        validBlocks.add(oneblock);
        blockToNodes.remove(oneblock);
        curSplitSize += oneblock.length;

        // if the accumulated split size exceeds the maximum, then 
        // create this split.
        if (maxSize != 0 && curSplitSize >= maxSize) {//3.1 如果maxSplitSize != 0, 累加OneBlockInfo大小,若已经大于或等于maxSize,则将这些OneBlockInfo形成一个切片;剩余的OneBlockInfo进入下一个rack计算;
          // create an input split and add it to the splits array
          addCreatedSplit(splits, getHosts(racks), validBlocks);
          createdSplit = true;
          break;
        }
      }
    }

    // if we created a split, then just go to the next rack
    if (createdSplit) {
      curSplitSize = 0;
      validBlocks.clear();
      racks.clear();
      continue;
    }

    if (!validBlocks.isEmpty()) {
      if (minSizeRack != 0 && curSplitSize >= minSizeRack) {//3.2 如果minSizeRack != 0, 并且累加的OneBlockInfo大小未已经大于minSizeRack并且a步骤上还没有切片,则将rack上的现存所有OneBlockInfo形成一个切片

        // if there is a minimum size specified, then create a single split
        // otherwise, store these blocks into overflow data structure
        addCreatedSplit(splits, getHosts(racks), validBlocks);
      } else {
        // There were a few blocks in this rack that 
        // remained to be processed. Keep them in 'overflow' block list. 
        // These will be combined later.
        overflowBlocks.addAll(validBlocks);
      }
    }
    curSplitSize = 0;
    validBlocks.clear();
    racks.clear();
  }
   
// Process all overflow blocks
for (OneBlockInfo oneblock : overflowBlocks) {//4. 处理所有剩余的OneBlockInfo
  validBlocks.add(oneblock);
  curSplitSize += oneblock.length;

  // This might cause an exiting rack location to be re-added,
  // but it should be ok.
  for (int i = 0; i < oneblock.racks.length; i++) {
    racks.add(oneblock.racks[i]);
  }

  // if the accumulated split size exceeds the maximum, then 
  // create this split.
  if (maxSize != 0 && curSplitSize >= maxSize) {// 4.1 如果剩余的OneBlockInfo的size累计大于maxSize则生成一个切片
    // create an input split and add it to the splits array
    addCreatedSplit(splits, getHosts(racks), validBlocks);
    curSplitSize = 0;
    validBlocks.clear();
    racks.clear();
  }
}

// Process any remaining blocks, if any.
if (!validBlocks.isEmpty()) {//4.2 最后剩余的生成1个切片
  addCreatedSplit(splits, getHosts(racks), validBlocks);
}
}
复制代码
文章分类
后端
文章标签