ES缓存源码详解

213 阅读12分钟
  1. ShardRequestCacheES层级的实现,缓存机制为 LRU, 访问一次,就会考虑缓存,主要用途是对聚合结果进行缓存
  2. NodeQueryCacheLucene层级的实现,缓存机制为 LRU, 访问达到一定频率,才会考虑缓存,主要用途是对filter子查询的缓存

  1. fielddata

对于 text类型的字段聚合查询,会使用 fielddata获取该字段的字段值,

(1) fielddata会占用大量的内存,默认是关闭

(2) text类型的聚合查询,一般没有意义,慎用

Shard request cache

分片级别的查询缓存,每个分片都有自己的缓存

缓存策略

并不是所有的分片级查询都会被缓存

缓存设置

Node Query Cache

NodeQueryCache是在Lucene层面实现的,默认开启,ES层面会进行一些策略控制和信息统计

缓存策略

并不是所有的filter查询都会被缓存

缓存设置

代码分析

image.png

Filter Cache实例化

Elasticsearch的Node在实例化时,会new一个IndicesService,其在构造函数中实例化了一个 IndicesQueryCache

ES IndicesQueryCache ElasticsearchLRUQueryCache 实例化

public class IndicesQueryCache implements QueryCache, Closeable {
    // Cache内存占用空间上限
    public static final Setting<ByteSizeValue> INDICES_CACHE_QUERY_SIZE_SETTING = 
           Setting.memorySizeSetting("indices.queries.cache.size", "10%", Property.NodeScope);
    // Cache 元素个数上限
    public static final Setting<Integer> INDICES_CACHE_QUERY_COUNT_SETTING = 
           Setting.intSetting("indices.queries.cache.count", 10_000, 1, Property.NodeScope);
    // LRU形式的cache
    private final LRUQueryCache cache;

    public IndicesQueryCache(Settings settings) {
        final ByteSizeValue size = INDICES_CACHE_QUERY_SIZE_SETTING.get(settings);
        final int count = INDICES_CACHE_QUERY_COUNT_SETTING.get(settings);
        logger.debug("using [node] query cache with size [{}] max filter count [{}]",
                size, count);
        // 实例化LRU Cache, 上限10000个元素或者10%的堆内存大小
        cache = new ElasticsearchLRUQueryCache(count, size.getBytes());
        sharedRamBytesUsed = 0;
    }
}

IndicesQueryCache 在实例化时会指定内存空间上限以及元素个数上限,通过这两个指标来达到LRU缓存淘汰的效果。这两个指标是Node级别的,也就是所有 Index,Shard 都共用的。

IndicesQueryCache 实现了 QueryCache 接口,这个接口是Lucene提供的用户缓存查询结果的接口

Lucene QueryCache,QueryCachingPolicy ,缓存类,缓存策略

public interface QueryCache {

  /**
   * Return a wrapper around the provided <code>weight</code> that will cache
   * matching docs per-segment accordingly to the given <code>policy</code>.
   * NOTE: The returned weight will only be equivalent if scores are not needed.
   * @see Collector#scoreMode()
   */
  Weight doCache(Weight weight, QueryCachingPolicy policy);

}

public interface QueryCachingPolicy {

  /** Callback that is called every time that a cached filter is used.
   *  This is typically useful if the policy wants to track usage statistics
   *  in order to make decisions. */
  void onUse(Query query);

  /** Whether the given {@link Query} is worth caching.
   *  This method will be called by the {@link QueryCache} to know whether to
   *  cache. It will first attempt to load a {@link DocIdSet} from the cache.
   *  If it is not cached yet and this method returns <tt>true</tt> then a
   *  cache entry will be generated. Otherwise an uncached scorer will be
   *  returned. */
  boolean shouldCache(Query query) throws IOException;

}

Lucene提供了 QueryCacheQueryCachingPolicy 两个接口来处理检索结果缓存,ES中用ElasticsearchLRUQueryCache 作为 QueryCache 的实现,默认使用Lucene提供的UsageTrackingQueryCachingPolicy 作为QueryCachingPolicy的实现,处理是否缓存以及检索时的操作。

Lucene IndexSearcher 实例化

Elasticsearch在构建Lucene的IndexSearcher时,指定了 QueryCacheQueryCachingPolicy,QueryCache使用了装饰模式,OptOutQueryCache 是Security相关的操作,IndicesQueryCache 就是上面代码实例化的,内部持有 ElasticsearchLRUQueryCache

IndexSearcher 检索

Create Cache Weight

public class org.apache.lucene.search.IndexSearcher {
    // 查询缓存
    private QueryCache queryCache ;
    // 查询缓存策略
    private QueryCachingPolicy queryCachingPolicy ;

    public Weight createWeight(Query query, boolean needsScores, float boost) throws IOException {
        final QueryCache queryCache = this.queryCache;
        Weight weight = query.createWeight(this, needsScores, boost);
        // 只有不需要score的检索且缓存有配置 才能缓存结果
        if (needsScores == false && queryCache != null) {
            // 查询缓存包装Weight, 返回CachingWrapperWeight
            weight = queryCache.doCache(weight, queryCachingPolicy);
        }
        return weight;
    }
}

想要缓存检索结果,需要满足两个前提

  1. 有设置QueryCache,这点在ES实例化IndexSearcher时已经制定了
  2. 当前检索不需要实时的计算评分,不需要评分就代表着不需要计算idf,这样document的增删改就不会对当前缓存结果有影响。

缓存的结果是基于segment的倒排索引不可变,也就会生成的segment不可变,这是缓存生效的前提。当对当前segment的数据做修改和删除时,变更信息记录在 .liv 文件内,不会直接修改segment原始文件。

QueryCachingPolicy#onUse( Query query),前置处理Query类型

public class UsageTrackingQueryCachingPolicy implements QueryCachingPolicy {

    public void onUse(Query query) {
        // 如果从不缓存
        if (shouldNeverCache(query)) {
            return;
        }
        // 记录query的hash值
        int hashCode = query.hashCode();
        synchronized (this) {
            // 添加最近使用的Filter, 通过记录hash值形式记录请求的频率
            recentlyUsedFilters.add(hashCode);
        }
    }

    private static boolean shouldNeverCache(Query query) {
       // 如果是通过Term查询的,不缓存, 因为查询效率很高,没有缓存的必要
       if (query instanceof TermQuery) {
           // We do not bother caching term queries since they are already plenty fast.
           return true;
       }
       // 如果查询所有DOC, 结果集比Bit Set 遍历更快
       if (query instanceof MatchAllDocsQuery) {
           // MatchAllDocsQuery has an iterator that is faster than what a bit set could do.
           return true;
       }
       // 不查询数据的query
       if (query instanceof MatchNoDocsQuery) {
           return true;
       }
       if (query instanceof BooleanQuery) {
           BooleanQuery bq = (BooleanQuery)query;
           // 没有查询条件从句,变相等于Match All
           if (bq.clauses().isEmpty()) {
               return true;
           }
       }
       if (query instanceof DisjunctionMaxQuery) {
           DisjunctionMaxQuery dmq = (DisjunctionMaxQuery)query;
           if (dmq.getDisjuncts().isEmpty()) {
               return true;
           }
       }
       return false;
   }

}

QueryCache对Query的类型有要求,当满足条件时,通过记录hashcode的形式来记录query的使用次数,这对后续是否缓存有影响。

LRUQueryCache#shouldCache,前置处理segment的数据量

public class LRUQueryCache implements QueryCache, Accountable {

    // 检查segment是否符合缓存条件,而不管查询如何
    private boolean shouldCache(LeafReaderContext context) throws IOException {
        // 缓存数据不能超过RAM上限
        return cacheEntryHasReasonableWorstCaseSize(ReaderUtil.getTopLevelContext(context).reader().maxDoc())
            // 缓存的segment的doc个数超过10000, 且占全部doc个数比例超过3%
            && leavesToCache.test(context);
    }

    private boolean cacheEntryHasReasonableWorstCaseSize(int maxDoc) {
        // The worst-case (dense) is a bit set which needs one bit per document
        // 最坏情况下, 使用BitSet缓存docID, 每个docID占用一个比特, 8个docID占用1字节
        final long worstCaseRamUsage = maxDoc / 8;
        // Cache指定的内存上限, 默认10%堆内存
        final long totalRamAvailable = maxRamBytesUsed;
        // 当前segment的docID的数据量不能超过Cache上限的20%,也就是堆内存的2%
        // 因为Cache是Node级别的, 如果一次性缓存数据太多, 会淘汰大量的已缓存数据, 类似于Mysql的全表扫描对缓存池的影响
        return worstCaseRamUsage * 5 < totalRamAvailable;
    }

    public LRUQueryCache(int maxSize, long maxRamBytesUsed) {
        this(maxSize, maxRamBytesUsed, new MinSegmentSizePredicate(10000, .03f), 10);
    }
    
    // pkg-private for testing
    static class MinSegmentSizePredicate implements Predicate<LeafReaderContext> {
        private final int minSize;
        private final float minSizeRatio;
    
        MinSegmentSizePredicate(int minSize, float minSizeRatio) {
            this.minSize = minSize;
            this.minSizeRatio = minSizeRatio;
        }

        public boolean test(LeafReaderContext context) {
            // 当前segment的最大doc序号, 也就是当前segment有多少个document
            final int maxDoc = context.reader().maxDoc();
            // 如果当前segment的doc个数 < 10000, 那么就不缓存了
            if (maxDoc < minSize) {
                return false;
            }
            final IndexReaderContext topLevelContext = ReaderUtil.getTopLevelContext(context);
            // 当前segment的doc个数 / 所有segment的doc个数
            final float sizeRatio = (float)context.reader().maxDoc() / topLevelContext.reader().maxDoc();
            // 当前segment的doc个数与全部doc个数的比例超过 3%
            return sizeRatio >= minSizeRatio;
        }
    }
}

QueryCache还对Segment的数据量有要求,太多和太少的都不缓存。数据量太多,可能会触发热点缓存被大量淘汰,导致后续需要重新查询;太少则缓存没有意义,重新查询一样很快。

QueryCachingPolicy#shouldCache,当前检索结果是否该被缓存

public class UsageTrackingQueryCachingPolicy implements QueryCachingPolicy {

    public boolean shouldCache(Query query) throws IOException {
        // 上文代码中已经看过
        if (shouldNeverCache(query)) {
            return false;
        }
        // 之前使用此query的次数, 通过query的hashcode来赋值取值, 在onUse代码里记录了query的hashcode
        final int frequency = frequency(query);
        // 对于需要对整个索引进行评估以构建DocIdSetIterator的过滤器
        //(如MultiTermQuery、point-based查询或TermInSetQuery)返回2,对于其他过滤器返回5
        final int minFrequency = minFrequencyToCache(query);
        // 最近使用的次数 >= 最小被缓存的次数
        return frequency >= minFrequency;
    }

    int frequency(Query query) {
        int hashCode = query.hashCode();
        synchronized (this) {
                // 取hashcode被记录的次数, 就是Query次数
            return recentlyUsedFilters.frequency(hashCode);
        }
    }

    protected int minFrequencyToCache(Query query) {
        // 如果是耗费比较大的查询请求
        if (isCostly(query)) {
            return 2;
        } else {
            // default: cache after the filter has been seen 5 times
            int minFrequency = 5;
            if (query instanceof BooleanQuery || query instanceof DisjunctionMaxQuery) {
                // 假如你一直重用一个看起来像"A OR B"的布尔查询,并且永远不在该上下文之外使用A和B查询
                // 使用5次后,我们会同时缓存A、B和 A OR B,这是浪费。因此,我们提前缓存复合查询,以便在这种情况下只缓存"A OR B"。
                minFrequency--;
            }
            return minFrequency;
        }
    }
}

当QueryCache未命中时,需要判断是否要缓存这次检索结果,如果符合要求,则执行检索,再缓存。

LRUQueryCache#buildCache, 构建缓存数据

public class LRUQueryCache implements QueryCache, Accountable {

    /**
     * Default cache implementation: uses {@link RoaringDocIdSet} for sets that have a density &lt; 1% and a {@link BitDocIdSet} over a {@link FixedBitSet}
     * otherwise.
     */
    protected DocIdSet cacheImpl(BulkScorer scorer, int maxDoc) throws IOException {
        // scorer.cost() 得到的是在当前segment里命中的doc数量
        if (scorer.cost() * 100 >= maxDoc) {
            // FixedBitSet is faster for dense sets and will enable the random-access
            // optimization in ConjunctionDISI
            // 当命中doc数量占当前segment的比例超过1%时, 用BitSet存储
            return cacheIntoBitSet(scorer, maxDoc);
        } else {
            // 用DocId数组存储
            return cacheIntoRoaringDocIdSet(scorer, maxDoc);
        }
    }

    // 使用BitSet结构缓存数据
    private static DocIdSet cacheIntoBitSet(BulkScorer scorer, int maxDoc) throws IOException {
        final FixedBitSet bitSet = new FixedBitSet(maxDoc);
        long cost[] = new long[1];
        scorer.score(new LeafCollector() {

            @Override
            public void setScorer(Scorer scorer) throws IOException {}

            @Override
            public void collect(int doc) throws IOException {
                cost[0]++;
                // 缓存数据仅docId
                bitSet.set(doc);
            }

        }, null);
        return new BitDocIdSet(bitSet, cost[0]);
    }

    // https://www.elastic.co/cn/blog/frame-of-reference-and-roaring-bitmaps
    private static CacheAndCount cacheIntoRoaringDocIdSet(BulkScorer scorer, int maxDoc)
        throws IOException {
      RoaringDocIdSet.Builder builder = new RoaringDocIdSet.Builder(maxDoc);
      scorer.score(
          new LeafCollector() {
    
            @Override
            public void setScorer(Scorable scorer) throws IOException {}
    
            @Override
            public void collect(int doc) throws IOException {
              builder.add(doc);
            }
          },
          null);
      RoaringDocIdSet cache = builder.build();
      return new CacheAndCount(cache, cache.cardinality());
    }
}

构建缓存数据时,当命中的doc数量超过当前segment的1%时,使用BitSet结构存储,因为其占用空间小,每个docId仅占用1bit;当小于1%时,使用long数组形式缓存docId,因为这样解析快,数据不大时很方便

经过上述步骤就能缓存检索的DocIdSet,下次重复检索时就能实现复用。

查询代码

public class LRUQueryCache implements QueryCache, Accountable {
    @Override
    public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOException {
      if (used.compareAndSet(false, true)) {
        policy.onUse(getQuery());
      }
    
      if (in.isCacheable(context) == false) {
        // this segment is not suitable for caching
        return in.scorerSupplier(context);
      }
    
      // Short-circuit: Check whether this segment is eligible for caching
      // before we take a lock because of #get
      if (shouldCache(context) == false) {
        return in.scorerSupplier(context);
      }
    
      final IndexReader.CacheHelper cacheHelper = context.reader().getCoreCacheHelper();
      if (cacheHelper == null) {
        // this reader has no cache helper
        return in.scorerSupplier(context);
      }
    
      // If the lock is already busy, prefer using the uncached version than waiting
      if (lock.tryLock() == false) {
        return in.scorerSupplier(context);
      }
    
      CacheAndCount cached;
      try {
        cached = get(in.getQuery(), cacheHelper);
      } finally {
        lock.unlock();
      }
    
      if (cached == null) {
        if (policy.shouldCache(in.getQuery())) {
          final ScorerSupplier supplier = in.scorerSupplier(context);
          if (supplier == null) {
            putIfAbsent(in.getQuery(), CacheAndCount.EMPTY, cacheHelper);
            return null;
          }
    
          final long cost = supplier.cost();
          return new ScorerSupplier() {
            @Override
            public Scorer get(long leadCost) throws IOException {
              // skip cache operation which would slow query down too much
              if (cost / skipCacheFactor > leadCost) {
                return supplier.get(leadCost);
              }
    
              Scorer scorer = supplier.get(Long.MAX_VALUE);
              CacheAndCount cached =
                  cacheImpl(new DefaultBulkScorer(scorer), context.reader().maxDoc());
              putIfAbsent(in.getQuery(), cached, cacheHelper);
              DocIdSetIterator disi = cached.iterator();
              if (disi == null) {
                // docIdSet.iterator() is allowed to return null when empty but we want a non-null
                // iterator here
                disi = DocIdSetIterator.empty();
              }
    
              return new ConstantScoreScorer(
                  CachingWrapperWeight.this, 0f, ScoreMode.COMPLETE_NO_SCORES, disi);
            }
    
            @Override
            public long cost() {
              return cost;
            }
          };
        } else {
          return in.scorerSupplier(context);
        }
      }
    
      assert cached != null;
      if (cached == CacheAndCount.EMPTY) {
        return null;
      }
      final DocIdSetIterator disi = cached.iterator();
      if (disi == null) {
        return null;
      }
    
      return new ScorerSupplier() {
        @Override
        public Scorer get(long LeadCost) throws IOException {
          return new ConstantScoreScorer(
              CachingWrapperWeight.this, 0f, ScoreMode.COMPLETE_NO_SCORES, disi);
        }
    
        @Override
        public long cost() {
          return disi.cost();
        }
      };
    }
 }

缓存监控

节点级别监控

通过 hitCount / (hitCount + missCount) 可以计算命中率

GET /_cat/nodes?v&h=name,queryCacheMemory,queryCacheHitCount,queryCacheMissCount,fielddataMemory,requestCacheMemory,requestCacheHitCount,requestCacheMissCount
name          queryCacheMemory queryCacheHitCount queryCacheMissCount fielddataMemory requestCacheMemory requestCacheHitCount requestCacheMissCount
81            1.7gb         1086360487          4899825499         320.4mb            189.9mb              3317905              87650679
84            1.6gb         1032360913          4583644806         294.9mb            128.3mb              3121582              87026805
82            1.5gb         1026487641          4691926256         318.5mb            131.3mb              3122880              86956484
89            1.7gb         1098099554          5015683302           324mb            193.3mb              3497803              88120856
80            1.6gb          998874442          4422520272         316.9mb            171.4mb              3122396              86982810
85            1.6gb         1048961594          4710553367         288.7mb            193.6mb              3282795              87465086
86            1.7gb         1054740975          4748200509           305mb              175mb              3349060              87793603
87            1.6gb         1056530219          4783851200         350.7mb            191.8mb              3360473              87024550
83            1.8gb         1082470724          5060248077         309.4mb            190.2mb              3440978              87846063
88            1.6gb         1076018640          5045370721         315.9mb            187.7mb              3649191              88161705

索引级别监控

GET test_index/_stats/query_cache,fielddata,request_cache?pretty&human
{
  "_shards": {
    "total": 4,
    "successful": 4,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "query_cache": {
        "memory_size": "44.5mb", //使用的size
        "memory_size_in_bytes": 46740615,
        "total_count": 65615435, //历史查询总条数 total=hit+miss
        "hit_count": 13057809,//命中的
        "miss_count": 52557626,//未命中的
        "cache_size": 661,//当前缓存的条数
        "cache_count": 55634,//历史缓存总条数
        "evictions": 54973//被驱逐的条数
      },
      "fielddata": {
        "memory_size": "9.5mb",
        "memory_size_in_bytes": 9998504,
        "evictions": 0
      },
      "request_cache": {
        "memory_size": "152.8kb",
        "memory_size_in_bytes": 156472,
        "evictions": 42,
        "hit_count": 57560,
        "miss_count": 759391
      }
    },
    "total": {
      "query_cache": {
        "memory_size": "91mb",
        "memory_size_in_bytes": 95466199,
        "total_count": 133520906,
        "hit_count": 26688002,
        "miss_count": 106832904,
        "cache_size": 1269,
        "cache_count": 115109,
        "evictions": 113840
      },
      "fielddata": {
        "memory_size": "18.4mb",
        "memory_size_in_bytes": 19327272,
        "evictions": 0
      },
      "request_cache": {
        "memory_size": "304.1kb",
        "memory_size_in_bytes": 311424,
        "evictions": 81,
        "hit_count": 115871,
        "miss_count": 1518861
      }
    }
  },
  "indices": {
    "test_index": {
      "uuid": "4hgWxos1ShKO7a5xGFVkwQ",
      "health": "green",
      "status": "open",
      "primaries": {
        "query_cache": {
          "memory_size": "44.5mb",
          "memory_size_in_bytes": 46740615,
          "total_count": 65615435,
          "hit_count": 13057809,
          "miss_count": 52557626,
          "cache_size": 661,
          "cache_count": 55634,
          "evictions": 54973
        },
        "fielddata": {
          "memory_size": "9.5mb",
          "memory_size_in_bytes": 9998504,
          "evictions": 0
        },
        "request_cache": {
          "memory_size": "152.8kb",
          "memory_size_in_bytes": 156472,
          "evictions": 42,
          "hit_count": 57560,
          "miss_count": 759391
        }
      },
      "total": {
        "query_cache": {
          "memory_size": "91mb",
          "memory_size_in_bytes": 95466199,
          "total_count": 133520906,
          "hit_count": 26688002,
          "miss_count": 106832904,
          "cache_size": 1269,
          "cache_count": 115109,
          "evictions": 113840
        },
        "fielddata": {
          "memory_size": "18.4mb",
          "memory_size_in_bytes": 19327272,
          "evictions": 0
        },
        "request_cache": {
          "memory_size": "304.1kb",
          "memory_size_in_bytes": 311424,
          "evictions": 81,
          "hit_count": 115871,
          "miss_count": 1518861
        }
      }
    }
  }
}

系列技术文章:java面试(持续更新中)