之前RT徒增问题复现,这次配合 hot thread 和 flame graph 有新发现
sun.nio.ch.FileChannelImpl.readInternal(Unknown Source)
sun.nio.ch.FileChannelImpl.read(Unknown Source)
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
还是一样的堆栈,从火焰图可以看到出现了 ObjectMonitor,配合源码 ( synchronized ),此时有另外一个猜想:在当前环境下存在 IO 瓶颈 ( 尽管一个明显表现是极度负载不均,但会不会确实是到了那个量任何一个节点都会出现相同问题?而理论上了来说 96G 的内存对于说 300MB 的索引来说理应绰绰有余,但看不到宿主机 Page Cache 的统计数据,无法确定 ),往这个思路上查阅官方文档 Retrieve selected fields from a search,有如下三个选择 ( 段落稍做调整 ):
Source Filtering A document’s _source is stored as a single field in Lucene. This structure means that the whole _source object must be loaded and parsed even if you’re only requesting part of it. Elasticsearch always attempts to load values from _source. This behavior has the same implications of source filtering where Elasticsearch needs to load and parse the entire _source to retrieve just one field.
Doc value fields Use the docvalue_fields parameter to get values for selected fields. This can be a good choice when returning a fairly small number of fields that support doc values, such as keywords and dates. Doc values store the same values as the _source but in an on-disk, column-based structure that’s optimized for sorting and aggregations. Since each field is stored separately, Elasticsearch only reads the field values that were requested and can avoid loading the whole document _source. All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space.
Stored fields Use the stored_fields parameter to get the values for specific stored fields (fields that use the store mapping option)
在上面的一段话中,有几个关键点:
- _source 整个作为一个字段存储 ( PS:_id、_routing 也会分别存储,这三个字段为三个 store field,看附录 );
- doc values 默认启用 ( PS:支持类型:Numeric types、date types、boolean type、keyword type etc. );
- 默认情况下,从 _source 获取数据,在 few selected fields 场景下,doc values 是一个更好的选择。
( elasticsearch 应该在 How to 新增一个 Tune for fetch speed )
来看一组示例数据:
说明:
1. 为了突出性能对比,fetch size 设置为 1000,field count 数量为 100,小数据量下并无明显区别;
2. 使用 preference=_only_nodes:xxxxxx 路由到固定节点;
3. 使用 doc values 的时候需要同步设置 "stored_fields": "_none_",否则会读取 _id 和 _routing 导致无效果;
4. docs count: 4764161, storeage size: 10.9g, field count: 100;
5. 数据由于没有太大波动性,因此多次请求取均值。
// 示例一:source filtering|查询所有字段|耗时均值 27ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
"size": 1000
}
// 示例二:source filtering|查询单个字段|耗时均值 30ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
"size": 1000,
"_source": ["spuId"]
}
// 示例三:doc values|查询指定字段 ( long type )|耗时均值 5ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
"size": 1000,
"stored_fields": "_none_",
"docvalue_fields": ["spuId"]
}
// 示例四:doc values|查询指定字段 ( keyword type )|耗时均值 10ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
"size": 1000,
"stored_fields": "_none_",
"docvalue_fields": ["ext"]
}
// 示例五:doc values|查询所有字段|耗时均值 40ms
GET goods/_search?preference=_only_nodes:1662460131002675532
{
"size": 1000,
"stored_fields": "_none_",
"docvalue_fields": [······( 此处省略上百个 field )]
}
可以看到,doc values 相比 _source 有巨大提升 ( 参考 示例二 to 示例三,性能提升 6 倍 ),原因在于列存 ( Column-based ) 有更好的空间局部性,先来看 wiki 的一个例子
| RowId | EmpId | Lastname | Firstname | Salary |
| 001 | 10 | Smith | Joe | 40000 |
| 002 | 12 | Jones | Mary | 50000 |
| 003 | 11 | Johnson | Cathy | 44000 |
| 004 | 22 | Jones | Bob | 55000 |
//列式数据库把一列中的数据值串在一起存储起来,然后再存储下一列的数据,以此类推:
10:001,12:002,11:003,22:004;
Smith:001,Jones:002,Johnson:003,Jones:004;
Joe:001,Mary:002,Cathy:003,Bob:004;
40000:001,50000:002,44000:003,55000:004;
// 行式数据库把一行中的数据值串在一起存储起来,然后再存储下一行的数据,以此类推
001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000;
Above all ( 此处不讨论具体实现,实现上有多种方案,例如索引加速、稀疏存储、LSM、WAL 等 ) :
- 相比行存 ( Row-based ) ,在单个 ( 少量 ) 字段场景下,同样一次磁盘 block I/O,列存 ( Column-based ) 有效载荷更高 ( 还有高压缩性进一步提升 );
- 上述优势同样适用各类 cache ,如:page cache、cpu cache line;
- 特别地,对于定长字段,有更好的性能 ( 参考
示例三to示例四,猜测索引优化成 offset );
当然,列存 ( Column-based ) 也并非银弹,由于同一行不同列分开存储 ( 设计上略复杂繁琐 ),在查询上,性能与选择的字段数成正比,当选择多数字段时,行存 ( Row-based ) 优先 ( 参考 示例一 to 示例五 );在更新上,相比行存有与字段数成正比次数的磁盘 IO;此外也有行列混合,这个更多用在 HATP 场景中。
附录
"_id"、"_routing"、"_source" 配置
// IdFieldMapper
public static class Defaults {
public static final String NAME = "_id";
public static final MappedFieldType FIELD_TYPE = new IdFieldType();
static {
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
FIELD_TYPE.setTokenized(false);
FIELD_TYPE.setStored(true);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setName(NAME);
FIELD_TYPE.freeze();
}
}
// RoutingFieldMapper
public static class Defaults {
public static final String NAME = "_routing";
public static final MappedFieldType FIELD_TYPE = new RoutingFieldType();
static {
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
FIELD_TYPE.setTokenized(false);
FIELD_TYPE.setStored(true);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setName(NAME);
FIELD_TYPE.freeze();
}
public static final boolean REQUIRED = false;
}
// SourceFieldMapper
public static class Defaults {
public static final String NAME = "_source";
public static final MappedFieldType FIELD_TYPE = new SourceFieldType();
static {
FIELD_TYPE.setIndexOptions(IndexOptions.NONE); // not indexed
FIELD_TYPE.setTokenized(false); // default value
FIELD_TYPE.setStored(true);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setName(NAME);
FIELD_TYPE.freeze();
}
}