HBase原理解析—读数据流程读数据流程简述 CLient先访问Zookeeper，获取hbase:meta表位于哪个R

读数据流程简述

CLient先访问Zookeeper，获取hbase:meta表位于哪个RegionServer中并且缓存到MetaCache
访问对应的RegionServer，获取hbase:meta表，根据读请求的table/rowkey，查询出目标数据位于哪个RegionServer中的哪个Region中，并将该Region信息缓存到客户端的MetaCache，方便下次访问
与目标RegionServer进行通讯
分别在BlockCache(读缓存)，MemStore和StoreFile(HFile)中查询目标数据，并将查到的所有数据进行合并
将文件中查询到的数据块(Block，HFIle 数据存储单元，默认大小为64KB)缓存到BlockCache
将合并后的最终数据返回给客户端

两种读取模式

Get

Get是指基于确切的RowKey去获取一行数据，通常称之为随机点查，这正是HBase所擅长的读取模式，一次Get操作，包含两个主要的步骤：

1.构建Get

基于RowKey构建Get对象的最简单示例代码如下：

final byte[] key = Bytes.toBytes("class***");
Get get = new Get(key);

可以为构建的Get对象执行返回的列簇：

final byte[] family = Bytes.toBytes("f1");
//指定返回列簇f1的所有列
get.addFamily(family);

也可以直接返回某列簇中的指定列

final byte[] family = Bytes.toBytes("f1");
final byte[] qualifierMobile = Bytes.toBytes("age");
//指定返回列簇f1中的列age
get.addColumn(family, qualifierMobile);

2.发送Get请求并获取对应的记录

与写数据类似，发送Get请求的接口也是Table提供的，获取到的一行记录，被封装成一个Result对象。也可以这么理解一个Result对象：

关联一行数据，一定不可能包含跨行的结果
包含一个或者多个被请求的列。有可能包含这行数据的所有列，也可能仅包含部分列

上面给出的时一次随机获取一行记录的例子，但事实上，一次批量获取多行记录的需求也是普遍存在的，Table中也定义了Batch Get的接口(对应RSRpcServices的multi接口)，这样可以在一次网络请求中同时获取多行数据。

Scan

Hbase中的数据表通过划分一个个的Region来实现数据的分片，每一个Region关联一个RowKey的范围区间，而每一个Region中的数据，按RowKey的字典顺序进行组织。

正是基于这种设计，使得HBase能够轻松应对这类查询：“指定一个RowKey的范围区间，获取该区间的所有记录”，这类查询在HBase被称之为Scan。

一次Scan操作，包含如下几个关键步骤：

1.构建Scan

最简单也最常用的构建Scan对象的方法，就是仅仅指定Scan的StartRow与StopRow。

示例如下：

final byte[] startKey = Bytes.toBytes("600430");
final byte[] stopKey  = Bytes.toBytes("600439");
Scan scan = new Scan();
scan.withStartRow(startKey).withStopRow(stopKey);

如果StartRow未指定，则本次Scan将从表的第一行数据开始读取。

如果StopRow未指定，而且在不主动停止本次Scan操作的前提下，本次Scan将会一直读取表的最后一行记录

如果StartRow与StopRow都未指定，那本次Scan就是一次全表扫描操作

同Get类似，Scan也可以主动指定返回的列簇或列

2.获取ResultScanner

ResultScanner scanner = table.getScanner(scan);

3.遍历查询结果

Result result = null;
// 通过scanner.next方法获取返回的每一行数据
while((result = scanner.next()) != null) {
    // 读取result
}

4.关闭ResultScanner

通过下面的方法关闭一个ResultScanner：

scanner.close()

Scan的其他重要参数

1.Cacheing：设置一次RPC请求批量读取的Result数量

下面的示例代码设定了一次读取回来的Results数量为100:

scan.setCaching(100);

Client每一次往RegionServer发送scan请求，都会批量拿回一批数据(由Caching决定过了每一次拿回的Results数量)，然后放到本次的Result Cache中：

应用每一次读取数据时，都是从本地的Result Cache中获取的，如果Result Cache中的数据读完了，则Client会再次往RegionServer发送scan请求更多的数据。

2.Batch：设置每一个Result中的列的数量

下面的示例代码设定了每一个Result中的列的数量限制值为3:

scan.setBatch(3)

该参数适用于一行数据量过大的场景，这样，一行数据被请求的列会被拆成多个Results返回给Client。

3.Limit:限制一次Scan操作所获取的行的数量

同SQL语法中的limit子句，限制一次Scan操作所获取的行的总量：

scan.setLimit(10000);

4.MaxResultSize：从内存占用量的维度限制一次RPC的返回结果集

下面的示例代码将返回结果集合的最大值设置为5MB

scan.setMaxResultSize(5*1024*1025);

5.Reversed Scan：反向扫描

普通的Scan操作是按照字典顺序从小到大的顺序读取的，而Reversed Scan则恰好相反：

scan.setReversed(true);

Client发送读请求到RegionServer

无论是Get，还是Scan，Client在发送请求之到RegionServer之前，都需要先获取路由信息

1.定位该请求所关联的Region

因为Get请求关联一个RowKey，所以，直接定位该RowKey所关联的Region即可

对于scan请求，先定位Scan的StartKey所关联的Region

2.往该Region所关联的RegionServer发送读取请求

该过程与前面的《写数据原理》的数据路由所描述的类似，在此不再赘述。

如果一次Scan涉及到跨Region的读取，读完一个Region的数据以后，需要继续读取下一个Region的数据，这需要再Client侧不断记录和刷新scan的进展信息。如果一个Region中已无更多的数据，在scan请求的响应结果中会带有提示信息，这样可以让Client侧切换到下一个Region继续读取。

RegionServer如何处理读取请求

内部结构梳理

1.一个表可能包含一个或多个Region

将Hbase中拥有数亿行的一个大表，横向切隔成一个个"子表"，这一个个“子表”就是Region

2.每一个Region中关联一个或多个列簇

如果将Region看成是一个表的横向切隔，那么一个Region中的数据列的纵向切隔，称之为Column Family。每一个列，都必须归属于一个Column Family，这个归属关系是写数据时指定的，而不是建表时预先定义。

3.每一个列簇关联一个MemStore，以及一个或多个HFile文件

上面的关于“Region与多列簇”的图中，泛化了Column Family的内部结构，下图是包含了MemStore与HFile的Column Family组成结构：

在Hbae的源码实现中，将一个Column Family抽象成一个Store对象。可以这么简单理解Column Family与Store的概念差异：Column Family更多的是面向用户层可感知的逻辑概念，而Store则是源码实现中的概念，是关于一个Column Family的抽象。

4.每一个MemStore中可能涉及一个Active Segment，以及一个或多个Immutable Segments

扩展到一个Region包含两个Column Family的情形：

5.HFile由Block构成，默认数据是按照有序组织成一个个64KB的Block

Data Block(上图中左侧的Data块)：保存了实际的KeyValue数据。
Data Index：关于Data Block的索引信息

基于一个给定的RowKey，HFile中提供的块索引信息能够快速查到对应的Data Block。

从上面的内容，我们大致了解了HBase读取的本质就是如何从包含1隔或多个列簇(每个列簇包含1个MemStore Segments，以及一个或多个HFiles)的Region中读取用户所期望的数据

为每个查询构建Scanner体系

在Store/Column Family内部，KeyValue可能存储在MemStore的Segment中，也可能存在于HFile中，无论是Segment还是HFile中，我们统称为KeyValue数据源

初次阅读RegionServer/Region的读取流程所涉及的源码时，会被各色各样的Scanner类搞混淆，Hbase使用了各种Scanner来抽象每一层/每一类KeyValue数据源的Scan操作：

关于一个Region的读取，被封装成一个RegionScanner对象。
每一个Store/Column Family的读取操作，被封装成一个StoreScanner对象中。
SegmentScanner与StoreFileScanner分别用来描述关于MemStore中的Segment以及HFile的读取操作。
StoreFileScanner中关于HFile的实际读取操作，由HFileScanner完成。

RegionScanner的构成如下图所示：

在StoreScanner内部，多个SegmentScanner与多个StoreFileScanner被组织在一个称之为KeyValueHeap的对象中

每一个Scanner内部有一个指针指向当前要读取的Keyvalue，KeyValue的核心是一个优先级队列，在这个队列中，按照每一个Scanner当前指针所指向的KeyValue进行排序

同样的，RegionScanner中的多个StoreScanner，也被组织在一个KeyvalueHeap对象中：

初始化Scanner体系

Scanner体系的核心在于三层scanner：RegionScanner、StoreScanner以及StoreFileScanner。三者是层级的关系，一个RegionScanner由多个StoreScanner构成，一张表由多个列簇组成，就有多个StoreScanner负责该列簇的数据扫描。一个StoreScanner又是多个StoreFileScanner组成。每个Store的数据由内存中的MemStore和磁盘上的StoreFile文件组成，相对应的，StoreScanner对象会持有N个SegmentScanner和N个StoreFileScanner来进行实际的数据读取，每个StoreFile文件对应一个StoreFileScanner，注意：StoreFileScanner和SegmentScanner时整个scan的最终执行者。初始化Scanner体系主要有以下几个核心步骤：

1）构建RegionScanner

按需选择对应的Store，并初始化对应的StoreScanner。

 private void initializeScanners(Scan scan, List<KeyValueScanner> additionalScanners)
    throws IOException {
    // Here we separate all scanners into two lists - scanner that provide data required
    // by the filter to operate (scanners list) and all others (joinedScanners list).
    List<KeyValueScanner> scanners = new ArrayList<>(scan.getFamilyMap().size());
    List<KeyValueScanner> joinedScanners = new ArrayList<>(scan.getFamilyMap().size());
    // Store all already instantiated scanners for exception handling
    List<KeyValueScanner> instantiatedScanners = new ArrayList<>();
    // handle additionalScanners
    if (additionalScanners != null && !additionalScanners.isEmpty()) {
      scanners.addAll(additionalScanners);
      instantiatedScanners.addAll(additionalScanners);
    }

    try {
      // 只初始化客户端需要的Family对应的StoreScanner
      for (Map.Entry<byte[], NavigableSet<byte[]>> entry : scan.getFamilyMap().entrySet()) {
        HStore store = region.getStore(entry.getKey());
        // 构建StoreScanner
        KeyValueScanner scanner = store.getScanner(scan, entry.getValue(), this.readPt);
        instantiatedScanners.add(scanner);
        if (
          this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
            || this.filter.isFamilyEssential(entry.getKey())
        ) {
          scanners.add(scanner);
        } else {
          joinedScanners.add(scanner);
        }
      }
      // 往RegionScanner中添加StoreScanner列表，合成一个KVHeap
      initializeKVHeap(scanners, joinedScanners, region);
    } catch (Throwable t) {
      throw handleException(instantiatedScanners, t);
    }
  }

2）构建StoreScanner

每个StoreScanner会为当前该Store中每个HFile构造一个StoreFileScanner，用于实际执行对应文件的检索。同时会为对应Memstore构造SegmentScanner，用于执行该Store中Memstore的数据检索

/**
   * Opens a scanner across memstore, snapshot, and all StoreFiles. Assumes we are not in a
   * compaction.
   * @param store   who we scan
   * @param scan    the spec
   * @param columns which columns we are scanning
   */
  public StoreScanner(HStore store, ScanInfo scanInfo, Scan scan, NavigableSet<byte[]> columns,
    long readPt) throws IOException {
    this(store, scan, scanInfo, columns != null ? columns.size() : 0, readPt, scan.getCacheBlocks(),
      ScanType.USER_SCAN);
    if (columns != null && scan.isRaw()) {
      throw new DoNotRetryIOException("Cannot specify any column for a raw scan");
    }
    matcher = UserScanQueryMatcher.create(scan, scanInfo, columns, oldestUnexpiredTS, now,
      store.getCoprocessorHost());

    store.addChangedReaderObserver(this);

    List<KeyValueScanner> scanners = null;
    try {
      // Pass columns to try to filter out unnecessary StoreFiles.
      //1.先获取该Store下所有SegmentScanner(对应MemStore)和所有StoreFileScanner(对应StoreFile)
      //2.再过滤淘汰不符合查询的HFile
      scanners = selectScannersFrom(store,
        store.getScanners(cacheBlocks, scanUsePread, false, matcher, scan.getStartRow(),
          scan.includeStartRow(), scan.getStopRow(), scan.includeStopRow(), this.readPt));

      // Seek all scanners to the start of the Row (or if the exact matching row
      // key does not exist, then to the start of the next matching Row).
      // Always check bloom filter to optimize the top row seek for delete
      // family marker.
      // 3.seek Store下的所有Scanner，通过查询(get/scan)的startKey，通过Block 块索引查询startKey对应所在Block的offset，接着
      // 读取对应Block，栈道startKey所在KV或下一个KV为当前Cell
      seekScanners(scanners, matcher.getStartKey(), explicitColumnQuery && lazySeekEnabledGlobally,
        parallelSeekEnabled);

      // set storeLimit
      this.storeLimit = scan.getMaxResultsPerColumnFamily();

      // set rowOffset
      this.storeOffset = scan.getRowOffsetPerColumnFamily();
      addCurrentScanners(scanners);
      // Combine all seeked scanners with a heap
      resetKVHeap(scanners, comparator);
    } catch (IOException e) {
      clearAndClose(scanners);
      // remove us from the HStore#changedReaderObservers here or we'll have no chance to
      // and might cause memory leak
      store.deleteChangedReaderObserver(this);
      throw e;
    }
  }

3）过滤淘汰部分不满足查询条件的Scanner

StoreScanner为每一个HFile构造一个对应的StoreFileScanner，需要注意的事实是，并不是每一个HFile都包含用户想要查找的KeyValue，相反，可以通过一些查询条件过滤掉很多肯定不存在待查找KeyValue的HFile。主要过滤策略有：Time Range过滤、Rowkey Range过滤以及布隆过滤器

protected List<KeyValueScanner> selectScannersFrom(HStore store,
    List<? extends KeyValueScanner> allScanners) {
    boolean memOnly;
    boolean filesOnly;
    if (scan instanceof InternalScan) {
      InternalScan iscan = (InternalScan) scan;
      memOnly = iscan.isCheckOnlyMemStore();
      filesOnly = iscan.isCheckOnlyStoreFiles();
    } else {
      memOnly = false;
      filesOnly = false;
    }

    List<KeyValueScanner> scanners = new ArrayList<>(allScanners.size());

    // We can only exclude store files based on TTL if minVersions is set to 0.
    // Otherwise, we might have to return KVs that have technically expired.
    long expiredTimestampCutoff = minVersions == 0 ? oldestUnexpiredTS : Long.MIN_VALUE;

    // include only those scan files which pass all filters
    for (KeyValueScanner kvs : allScanners) {
      boolean isFile = kvs.isFileScanner();
      if ((!isFile && filesOnly) || (isFile && memOnly)) {
        kvs.close();
        continue;
      }
      // 过滤淘汰不符合查询条件的HFile
      if (kvs.shouldUseScanner(scan, store, expiredTimestampCutoff)) {
        scanners.add(kvs);
      } else {
        kvs.close();
      }
    }
    return scanners;
}

 public boolean shouldUseScanner(Scan scan, HStore store, long oldestUnexpiredTS) {
    // if the file has no entries, no need to validate or create a scanner.
    byte[] cf = store.getColumnFamilyDescriptor().getName();
    TimeRange timeRange = scan.getColumnFamilyTimeRange().get(cf);
    if (timeRange == null) {
      timeRange = scan.getTimeRange();
    }
    //TimeRange过滤 & KeyRange过滤 & 布隆过滤器
    return reader.passesTimerangeFilter(timeRange, oldestUnexpiredTS)
      && reader.passesKeyRangeFilter(scan)
      && reader.passesBloomFilter(scan, scan.getFamilyMap().get(cf));
  }

4）每个Scanner seek 到startKey

这个步骤在每个HFile(MemStore)文件中seek扫描起始点startKey。如果HFile中没有找到startKey，则seek下一个KV地址，Seek过程是一个很核心的步骤，它主要包含下面三个步骤：

定位Block Offset：在Blockcache中读取该HFile的索引结构，根据索引树检索到对应RowKey所在的BLock Offset 和Block Size
Load Block：根据BlockOffset首先在BlockCache查找Data BLock，如果不在缓存，再从HFile中加载
Seek Key：在加载的Data Block中定位具体的RowKey

HFileReaderImpl#seekTo中 public int seekTo(Cell key, boolean rewind) throws IOException { // 读取HFile块索引 HFileBlockIndex.BlockIndexReader indexReader = reader.getDataBlockIndexReader(); BlockWithScanInfo blockWithScanInfo = indexReader.loadDataBlockWithScanInfo(key, curBlock, cacheBlocks, pread, isCompaction, getEffectiveDataBlockEncoding(), reader); if (blockWithScanInfo == null || blockWithScanInfo.getHFileBlock() == null) { // This happens if the key e.g. falls before the beginning of the file. return -1; } // 根据已加载的Block，seek startKey位置 return loadBlockAndSeekToKey(blockWithScanInfo.getHFileBlock(), blockWithScanInfo.getNextIndexedKey(), rewind, key, false); }

5）KeyValueScanner合并构建小顶堆

将该Store中的所有StoreFileScanner和MemStoreScanner形成一个heap(小顶堆)，所谓heap实际上是一个优先级队列。在队列中，按照Scanner排序规则将Scanner seek得到的KeyValue由小到大进行排序。最小堆管理Scanner可以保证·取出来的KV都是最小的，这样依次不断的pop就可以由小到大获取目标KeyValue集合，保证有序性。小顶堆的核心操作主要有peek和next如下：

@InterfaceAudience.Private
public class KeyValueHeap extends NonReversedNonLazyKeyValueScanner
  implements KeyValueScanner, InternalScanner {
  private static final Logger LOG = LoggerFactory.getLogger(KeyValueHeap.class);
  protected PriorityQueue<KeyValueScanner> heap = null;
  // Holds the scanners when a ever a eager close() happens. All such eagerly closed
  // scans are collected and when the final scanner.close() happens will perform the
  // actual close.
  protected List<KeyValueScanner> scannersForDelayedClose = null;
  
  @Override
  public Cell peek() {
    if (this.current == null) {
      return null;
    }
    return this.current.peek();
  }

  boolean isLatestCellFromMemstore() {
    return !this.current.isFileScanner();
  }

  @Override
  public Cell next() throws IOException {
    if (this.current == null) {
      return null;
    }
    Cell kvReturn = this.current.next();
    Cell kvNext = this.current.peek();
    if (kvNext == null) {
      this.scannersForDelayedClose.add(this.current);
      this.current = null;
      this.current = pollRealKV();
    } else {
      KeyValueScanner topScanner = this.heap.peek();
      // no need to add current back to the heap if it is the only scanner left
      if (topScanner != null && this.comparator.compare(kvNext, topScanner.peek()) >= 0) {
        this.heap.add(this.current);
        this.current = null;
        this.current = pollRealKV();
      }
    }
    return kvReturn;
  }  
}

通过next请求读取一行行数据

完成构建及初始化Scanner体系后，KeyValue此时已经可以由大到小依次通过RegionScanner#next获得

如果将RegionScanner理解为一个内部构造复杂的机器，而驱动这个机器运转的动力源自Client侧的一次次scan请求，scan请求通过调用RegionScanner的next方法来获取一行行结果

Get在服务端的入口RERpcServices#get()

private Result get(Get get, HRegion region, RegionScannersCloseCallBack closeCallBack, RpcCallContext context) throws IOException { region.prepareGet(get); boolean stale = region.getRegionInfo().getReplicaId() != 0;

// This method is almost the same as HRegion#get.
List<Cell> results = new ArrayList<>();
long before = EnvironmentEdgeManager.currentTime();
// pre-get CP hook
if (region.getCoprocessorHost() != null) {
  if (region.getCoprocessorHost().preGet(get, results)) {
    region.metricsUpdateForGet(results, before);
    return Result.create(results, get.isCheckExistenceOnly() ? !results.isEmpty() : null,
      stale);
  }
}
Scan scan = new Scan(get);
if (scan.getLoadColumnFamiliesOnDemandValue() == null) {
  scan.setLoadColumnFamiliesOnDemand(region.isLoadingCfsOnDemandDefault());
}
RegionScannerImpl scanner = null;
try {
  // 先通过HRegion 构建一个RegionScanner
  scanner = region.getScanner(scan);
  // 再通过RegionScanner#next获取一行数据
  scanner.next(results);
} finally {
  if (scanner != null) {
    if (closeCallBack == null) {
      // If there is a context then the scanner can be added to the current
      // RpcCallContext. The rpc callback will take care of closing the
      // scanner, for eg in case
      // of get()
      context.setCallBack(scanner);
    } else {
      // The call is from multi() where the results from the get() are
      // aggregated and then send out to the
      // rpc. The rpccall back will close all such scanners created as part
      // of multi().
      closeCallBack.addScanner(scanner);
    }
  }
}

Scan在服务端的入口RSRpcServices#scan()

@Override public ScanResponse scan(final RpcController controller, final ScanRequest request) throws ServiceException { if (controller != null && !(controller instanceof HBaseRpcController)) { throw new UnsupportedOperationException( "We only do " + "HBaseRpcControllers! FIX IF A PROBLEM: " + controller); } // 判断此请求是否已经有ScannerId if (!request.hasScannerId() && !request.hasScan()) { throw new ServiceException( new DoNotRetryIOException("Missing required input: scannerId or scan")); } try { checkOpen(); } catch (IOException e) { if (request.hasScannerId()) { String scannerName = toScannerName(request.getScannerId()); if (LOG.isDebugEnabled()) { LOG.debug( "Server shutting down and client tried to access missing scanner " + scannerName); } final LeaseManager leaseManager = server.getLeaseManager(); if (leaseManager != null) { try { leaseManager.cancelLease(scannerName); } catch (LeaseException le) { // No problem, ignore if (LOG.isTraceEnabled()) { LOG.trace("Un-able to cancel lease of scanner. It could already be closed."); } } } } throw new ServiceException(e); } requestCount.increment(); rpcScanRequestCount.increment(); RegionScannerHolder rsh; ScanResponse.Builder builder = ScanResponse.newBuilder(); String scannerName; try { if (request.hasScannerId()) { // The downstream projects such as AsyncHBase in OpenTSDB need this value. See HBASE-18000 // for more details. long scannerId = request.getScannerId(); builder.setScannerId(scannerId); scannerName = toScannerName(scannerId); //根据scannerId获取RegionScanner rsh = getRegionScanner(request); } else { Pair<String, RegionScannerHolder> scannerNameAndRSH = newRegionScanner(request, builder); scannerName = scannerNameAndRSH.getFirst(); //如果没有ScannerId,则构建一个RegionScanner rsh = scannerNameAndRSH.getSecond(); } } catch (IOException e) { if (e == SCANNER_ALREADY_CLOSED) { // Now we will close scanner automatically if there are no more results for this region but // the old client will still send a close request to us. Just ignore it and return. return builder.build(); } throw new ServiceException(e); } if (rsh.fullRegionScan) { rpcFullScanRequestCount.increment(); } HRegion region = rsh.r; LeaseManager.Lease lease; try { // Remove lease while its being processed in server; protects against case // where processing of request takes > lease expiration time. or null if none found. lease = server.getLeaseManager().removeLease(scannerName); } catch (LeaseException e) { throw new ServiceException(e); } if (request.hasRenew() && request.getRenew()) { // add back and return addScannerLeaseBack(lease); try { checkScanNextCallSeq(request, rsh); } catch (OutOfOrderScannerNextException e) { throw new ServiceException(e); } return builder.build(); } OperationQuota quota; try { quota = getRpcQuotaManager().checkQuota(region, OperationQuota.OperationType.SCAN); } catch (IOException e) { addScannerLeaseBack(lease); throw new ServiceException(e); } try { checkScanNextCallSeq(request, rsh); } catch (OutOfOrderScannerNextException e) { addScannerLeaseBack(lease); throw new ServiceException(e); } // Now we have increased the next call sequence. If we give client an error, the retry will // never success. So we'd better close the scanner and return a DoNotRetryIOException to client // and then client will try to open a new scanner. boolean closeScanner = request.hasCloseScanner() ? request.getCloseScanner() : false; int rows; // this is scan.getCaching if (request.hasNumberOfRows()) { rows = request.getNumberOfRows(); } else { rows = closeScanner ? 0 : 1; } RpcCall rpcCall = RpcServer.getCurrentCall().orElse(null); // now let's do the real scan. long maxQuotaResultSize = Math.min(maxScannerResultSize, quota.getReadAvailable()); RegionScanner scanner = rsh.s; // this is the limit of rows for this scan, if we the number of rows reach this value, we will // close the scanner. int limitOfRows; if (request.hasLimitOfRows()) { limitOfRows = request.getLimitOfRows(); } else { limitOfRows = -1; } MutableObject lastBlock = new MutableObject<>(); boolean scannerClosed = false; try { List results = new ArrayList<>(Math.min(rows, 512)); if (rows > 0) { boolean done = false; // Call coprocessor. Get region info from scanner. if (region.getCoprocessorHost() != null) { Boolean bypass = region.getCoprocessorHost().preScannerNext(scanner, results, rows); if (!results.isEmpty()) { for (Result r : results) { lastBlock.setValue(addSize(rpcCall, r, lastBlock.getValue())); } } if (bypass != null && bypass.booleanValue()) { done = true; } } if (!done) { scan((HBaseRpcController) controller, request, rsh, maxQuotaResultSize, rows, limitOfRows, results, builder, lastBlock, rpcCall); } else { builder.setMoreResultsInRegion(!results.isEmpty()); } } else { // This is a open scanner call with numberOfRow = 0, so set more results in region to true. builder.setMoreResultsInRegion(true); }
```
  quota.addScanResult(results);
  addResults(builder, results, (HBaseRpcController) controller,
    RegionReplicaUtil.isDefaultReplica(region.getRegionInfo()),
    isClientCellBlockSupport(rpcCall));
  if (scanner.isFilterDone() && results.isEmpty()) {
    // If the scanner's filter - if any - is done with the scan
    // only set moreResults to false if the results is empty. This is used to keep compatible
    // with the old scan implementation where we just ignore the returned results if moreResults
    // is false. Can remove the isEmpty check after we get rid of the old implementation.
    builder.setMoreResults(false);
  }
  // Later we may close the scanner depending on this flag so here we need to make sure that we
  // have already set this flag.
  assert builder.hasMoreResultsInRegion();
  // we only set moreResults to false in the above code, so set it to true if we haven't set it
  // yet.
  if (!builder.hasMoreResults()) {
    builder.setMoreResults(true);
  }
  if (builder.getMoreResults() && builder.getMoreResultsInRegion() && !results.isEmpty()) {
    // Record the last cell of the last result if it is a partial result
    // We need this to calculate the complete rows we have returned to client as the
    // mayHaveMoreCellsInRow is true does not mean that there will be extra cells for the
    // current row. We may filter out all the remaining cells for the current row and just
    // return the cells of the nextRow when calling RegionScanner.nextRaw. So here we need to
    // check for row change.
    Result lastResult = results.get(results.size() - 1);
    if (lastResult.mayHaveMoreCellsInRow()) {
      rsh.rowOfLastPartialResult = lastResult.getRow();
    } else {
      rsh.rowOfLastPartialResult = null;
    }
  }
  if (!builder.getMoreResults() || !builder.getMoreResultsInRegion() || closeScanner) {
    scannerClosed = true;
    closeScanner(region, scanner, scannerName, rpcCall);
  }

  // There's no point returning to a timed out client. Throwing ensures scanner is closed
  if (rpcCall != null && EnvironmentEdgeManager.currentTime() > rpcCall.getDeadline()) {
    throw new TimeoutIOException("Client deadline exceeded, cannot return results");
  }

  return builder.build();
} catch (IOException e) {
  try {
    // scanner is closed here
    scannerClosed = true;
    // The scanner state might be left in a dirty state, so we will tell the Client to
    // fail this RPC and close the scanner while opening up another one from the start of
    // row that the client has last seen.
    closeScanner(region, scanner, scannerName, rpcCall);

    // If it is a DoNotRetryIOException already, throw as it is. Unfortunately, DNRIOE is
    // used in two different semantics.
    // (1) The first is to close the client scanner and bubble up the exception all the way
    // to the application. This is preferred when the exception is really un-recoverable
    // (like CorruptHFileException, etc). Plain DoNotRetryIOException also falls into this
    // bucket usually.
    // (2) Second semantics is to close the current region scanner only, but continue the
    // client scanner by overriding the exception. This is usually UnknownScannerException,
    // OutOfOrderScannerNextException, etc where the region scanner has to be closed, but the
    // application-level ClientScanner has to continue without bubbling up the exception to
    // the client. See ClientScanner code to see how it deals with these special exceptions.
    if (e instanceof DoNotRetryIOException) {
      throw e;
    }

    // If it is a FileNotFoundException, wrap as a
    // DoNotRetryIOException. This can avoid the retry in ClientScanner.
    if (e instanceof FileNotFoundException) {
      throw new DoNotRetryIOException(e);
    }

    // We closed the scanner already. Instead of throwing the IOException, and client
    // retrying with the same scannerId only to get USE on the next RPC, we directly throw
    // a special exception to save an RPC.
    if (VersionInfoUtil.hasMinimumVersion(rpcCall.getClientVersionInfo(), 1, 4)) {
      // 1.4.0+ clients know how to handle
      throw new ScannerResetException("Scanner is closed on the server-side", e);
    } else {
      // older clients do not know about SRE. Just throw USE, which they will handle
      throw new UnknownScannerException("Throwing UnknownScannerException to reset the client"
        + " scanner state for clients older than 1.3.", e);
    }
  } catch (IOException ioe) {
    throw new ServiceException(ioe);
  }
} finally {
  if (!scannerClosed) {
    // Adding resets expiration time on lease.
    // the closeCallBack will be set in closeScanner so here we only care about shippedCallback
    if (rpcCall != null) {
      rpcCall.setCallBack(rsh.shippedCallback);
    } else {
      // If context is null,here we call rsh.shippedCallback directly to reuse the logic in
      // rsh.shippedCallback to release the internal resources in rsh,and lease is also added
      // back to regionserver's LeaseManager in rsh.shippedCallback.
      runShippedCallback(rsh);
    }
  }
  quota.close();
}
```
}

我们假定一个RegionScanner中仅包含一个StoreScanner，那么这个RegionScanner中的核心读取操作，是由StoreScanner完成的，我们进一步假定StoreScanner由四个Scanners组成，如下所示：

每一个Scanner中都有一个current指针指向下一个即将要读取的KV，KVHeap中的PriorityQueue正是按照每一个Scanner的current所指向的KV进行排序

第一次next请求，将会返回ScannerA中的Row01:FamA:Col1，而后ScannerA得指针移动到下一个KV的Row01:FamA:Col2，PrirorityQueue中的Scanners排序依然不变：

第二次next请求，依然返回ScannerA中的Row01:FamA:Col2，ScannerA的指针移动到下一个KV Row02:FamA:Col1，此时，PriorityQueue中的Scanners排序发生了变化：

下一次next请求，将会返回ScannerB中的KV....周而复始，直到某一个Scanner所读取的数据耗尽，该Scanner将会close，不再出现上面的PriorityQueue中。