之前RT徒增问题复现，这次配合 hot thread 和 flame graph 有新发现

       sun.nio.ch.FileChannelImpl.readInternal(Unknown Source)
       sun.nio.ch.FileChannelImpl.read(Unknown Source)
       org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
       org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
       org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)

还是一样的堆栈，从火焰图可以看到出现了 ObjectMonitor，配合源码 ( synchronized )，此时有另外一个猜想：在当前环境下存在 IO 瓶颈 ( 尽管一个明显表现是极度负载不均，但会不会确实是到了那个量任何一个节点都会出现相同问题？而理论上了来说 96G 的内存对于说 300MB 的索引来说理应绰绰有余，但看不到宿主机 Page Cache 的统计数据，无法确定 )，往这个思路上查阅官方文档 Retrieve selected fields from a search，有如下三个选择 ( 段落稍做调整 )：

Source Filtering A document’s _source is stored as a single field in Lucene. This structure means that the whole _source object must be loaded and parsed even if you’re only requesting part of it. Elasticsearch always attempts to load values from _source. This behavior has the same implications of source filtering where Elasticsearch needs to load and parse the entire _source to retrieve just one field.

Doc value fields Use the docvalue_fields parameter to get values for selected fields. This can be a good choice when returning a fairly small number of fields that support doc values, such as keywords and dates. Doc values store the same values as the _source but in an on-disk, column-based structure that’s optimized for sorting and aggregations. Since each field is stored separately, Elasticsearch only reads the field values that were requested and can avoid loading the whole document _source. All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space.

Stored fields Use the stored_fields parameter to get the values for specific stored fields (fields that use the store mapping option)

在上面的一段话中，有几个关键点：

_source 整个作为一个字段存储 ( PS：_id、_routing 也会分别存储，这三个字段为三个 store field，看附录 )；
doc values 默认启用 ( PS：支持类型：Numeric types、date types、boolean type、keyword type etc. )；
默认情况下，从 _source 获取数据，在 few selected fields 场景下，doc values 是一个更好的选择。

( elasticsearch 应该在 How to 新增一个 Tune for fetch speed )

来看一组示例数据：

说明：
1. 为了突出性能对比，fetch size 设置为 1000，field count 数量为 100，小数据量下并无明显区别；
2. 使用 preference=_only_nodes:xxxxxx 路由到固定节点；
3. 使用 doc values 的时候需要同步设置 "stored_fields": "_none_"，否则会读取 _id 和 _routing 导致无效果；
4. docs count: 4764161, storeage size: 10.9g, field count: 100；
5. 数据由于没有太大波动性，因此多次请求取均值。

// 示例一：source filtering｜查询所有字段｜耗时均值 27ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
  "size": 1000
}

// 示例二：source filtering｜查询单个字段｜耗时均值 30ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
  "size": 1000,
  "_source": ["spuId"]
}

// 示例三：doc values｜查询指定字段 ( long type )｜耗时均值 5ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
  "size": 1000,
  "stored_fields": "_none_",
  "docvalue_fields": ["spuId"]
}

// 示例四：doc values｜查询指定字段 ( keyword type )｜耗时均值 10ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
  "size": 1000,
  "stored_fields": "_none_",
  "docvalue_fields": ["ext"]
}

// 示例五：doc values｜查询所有字段｜耗时均值 40ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
  "size": 1000,
  "stored_fields": "_none_",
  "docvalue_fields": [······( 此处省略上百个 field )]
}

可以看到，doc values 相比 _source 有巨大提升 ( 参考 示例二 to 示例三，性能提升 6 倍 )，原因在于列存 ( Column-based ) 有更好的空间局部性，先来看 wiki 的一个例子


RowId	EmpId	Lastname	Firstname	Salary
001	10	Smith	Joe	40000
002	12	Jones	Mary	50000
003	11	Johnson	Cathy	44000
004	22	Jones	Bob	55000

//列式数据库把一列中的数据值串在一起存储起来，然后再存储下一列的数据，以此类推：
10:001,12:002,11:003,22:004;
Smith:001,Jones:002,Johnson:003,Jones:004;
Joe:001,Mary:002,Cathy:003,Bob:004;
40000:001,50000:002,44000:003,55000:004;

// 行式数据库把一行中的数据值串在一起存储起来，然后再存储下一行的数据，以此类推
001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000;

Above all ( 此处不讨论具体实现，实现上有多种方案，例如索引加速、稀疏存储、LSM、WAL 等 ) ：

相比行存 ( Row-based ) ，在单个 ( 少量 ) 字段场景下，同样一次磁盘 block I/O，列存 ( Column-based ) 有效载荷更高 ( 还有高压缩性进一步提升 )；
上述优势同样适用各类 cache ，如：page cache、cpu cache line；
特别地，对于定长字段，有更好的性能 ( 参考 示例三 to 示例四，猜测索引优化成 offset )；

当然，列存 ( Column-based ) 也并非银弹，由于同一行不同列分开存储 ( 设计上略复杂繁琐 )，在查询上，性能与选择的字段数成正比，当选择多数字段时，行存 ( Row-based ) 优先 ( 参考 示例一 to 示例五 )；在更新上，相比行存有与字段数成正比次数的磁盘 IO；此外也有行列混合，这个更多用在 HATP 场景中。

附录

"_id"、"_routing"、"_source" 配置

// IdFieldMapper
public static class Defaults {
    public static final String NAME = "_id";

    public static final MappedFieldType FIELD_TYPE = new IdFieldType();

    static {
        FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
        FIELD_TYPE.setTokenized(false);
        FIELD_TYPE.setStored(true);
        FIELD_TYPE.setOmitNorms(true);
        FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
        FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
        FIELD_TYPE.setName(NAME);
        FIELD_TYPE.freeze();
    }
}

// RoutingFieldMapper
public static class Defaults {
    public static final String NAME = "_routing";

    public static final MappedFieldType FIELD_TYPE = new RoutingFieldType();

    static {
        FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
        FIELD_TYPE.setTokenized(false);
        FIELD_TYPE.setStored(true);
        FIELD_TYPE.setOmitNorms(true);
        FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
        FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
        FIELD_TYPE.setName(NAME);
        FIELD_TYPE.freeze();
    }

    public static final boolean REQUIRED = false;
}

// SourceFieldMapper
public static class Defaults {
    public static final String NAME = "_source";

    public static final MappedFieldType FIELD_TYPE = new SourceFieldType();

    static {
        FIELD_TYPE.setIndexOptions(IndexOptions.NONE); // not indexed
        FIELD_TYPE.setTokenized(false); // default value
        FIELD_TYPE.setStored(true);
        FIELD_TYPE.setOmitNorms(true);
        FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
        FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
        FIELD_TYPE.setName(NAME);
        FIELD_TYPE.freeze();
    }
}

更进一步，行存到列存

附录

"_id"、"_routing"、"_source" 配置