ES5.4源码分析之预备知识点(陆续补充)

336 阅读3分钟

涉及的知识点

基本概念

索引结构

  • 存储结构上 由 _index,_type和_id标记唯一的文档
    • _index指出一个或者多个物理分片的逻辑命名空间
    • _type区分同一个集合
    • _id 文档标记由系统自动生成

分片

  • 在分布式系统中,单机无法存储规模巨大的数据量,要依靠大规模集群处理和存储这些数据,一般通过增加机器水平扩展提高整个集群能力,需要将数据分为若干个小块分配到各个机器上
  • 将数据分片以提高水平扩展能力,分布式存储中还会把数据复制多个副本
  • ES分片分为主分片和副分片,写索引是只能写在主分片上,然后同步到副本分片。写操作:对文档的新建、索引和删除请求,必须在主分片上面完成之后才能被复制到相关的副本分片。
  • 分片是底层的最基本读写单元,分片的目的是为了分隔巨大索引,让读写可以并行操作。
  • 分片可以独立执行读写工作
  • 在5.x之主索引数量是不能修改的,副分配可以随时修改。5.x-6.x之后,es已经支持在一定条件的限制下,对主索引进行拆分和缩小,还是尽量提前规划好分片的数量。

段合并

  • 每秒清空一次写操作,将这些数据写入文件,这个过程称为 refresh,每次refresh会创建一个新的lucene段。

集群节点角色

  • 主节点
    • 主节点尽量不要做数据节点
  • 数据节点
  • 预处理节点(ingest node)
    • 写入数据之前,通过事先定义好一系列的处理器和管道,对数据进行转换
  • 协调节点
    • 客户端请求可以发送到集群的任何节点,每个节点都知道任意文档所处的位置,然后转发这些请求,收集数据并返回给客户端,处理客户端请求的节点称为协调节点。
    • 协调节点将请求转发给保存数据的数据节点

主要内部模块

  • cluster

    • 主节点执行集权管理的封装实现,管理集群状态,维护集群层面的配置信息
  • allocation

    • 封装了分片分配相关的功能和策略,包括主分片分配和副分配分配
  • discovery

  • gateway

  • indices

  • http

  • transport

  • engine

先了解架构图

image.png 注:该图从网上找的,借用下,谢谢

生命周期管理

LifecycleComponent

public interface LifecycleComponent extends Releasable {

    Lifecycle.State lifecycleState();

    void addLifecycleListener(LifecycleListener listener);

    void removeLifecycleListener(LifecycleListener listener);

    void start();

    void stop();
}
  • 状态管理
  • 增加监听器
  • 启动和关闭行为

Lifecycle

INITIALIZED -> STARTED, STOPPED, CLOSED
STARTED -> STOPPED
STOPPED -> STARTED, CLOSED
CLOSED ->
 public enum State {
        INITIALIZED,
        STOPPED,
        STARTED,
        CLOSED
    }
  • 定义:
    • 生命周期实体类

    • 状态机

LifecycleListener

public void beforeStart() {

}

public void afterStart() {

}

public void beforeStop() {

}

public void afterStop() {

}

public void beforeClose() {

}

public void afterClose() {

}

AbstractLifecycleComponent

  • 成员变量

    • protected final Lifecycle lifecycle = new Lifecycle();
      
      private final List<LifecycleListener> listeners = new CopyOnWriteArrayList<>();
      
  • public void start() {
      	// 安全启动,不容易重复启动
        if (!lifecycle.canMoveToStarted()) {
            return;
        }
        for (LifecycleListener listener : listeners) {
            listener.beforeStart();
        }
      	// 真正的执行
        doStart();
        lifecycle.moveToStarted();
        for (LifecycleListener listener : listeners) {
            listener.afterStart();
        }
    }
    
    protected abstract void doStart();
    
    
    
  • @Override
    public void stop() {
        if (!lifecycle.canMoveToStopped()) {
            return;
        }
        for (LifecycleListener listener : listeners) {
            listener.beforeStop();
        }
        lifecycle.moveToStopped();
        doStop();
        for (LifecycleListener listener : listeners) {
            listener.afterStop();
        }
    }
    
    protected abstract void doStop();
    
  • 下面类都使用了生命周期管理

    • AzureComputeServiceImpl
    • BlobStoreRepository
    • CircuitBreakerService
    • ClusterService
    • DelayedAllocationService
    • GatewayService
    • GceMetadataService
    • HdfsRepository
    • IndicesClusterStateService
    • IndicesService
    • IndicesTTLService
    • InternalAwsS3Service
    • JvmGcMonitorService
    • LocalDiscovery
    • LocalTransport
    • MockTcpTransport
    • MonitorService
    • Netty3HttpServerTransport
    • Netty3Transport
    • Netty4HttpServerTransport
    • Netty4Transport
    • NodeConnectionsService
    • NoneDiscovery
    • ResourceWatcherService
    • RoutingService
    • SearchService
    • SingleNodeDiscovery
    • SnapshotShardsService
    • SnapshotsService
    • TcpTransport
    • TransportService
    • TribeService
    • ZenDiscovery

Inject(轻量级注入器)

例子

public class FooApplication {
         public static void main(String[] args) {
           Injector injector = Guice.createInjector(
               new ModuleA(),
               new ModuleB(),
               . . .
               new FooApplicationFlagsModule(args)
           );
  
           // Now just bootstrap the application and you're done
           FooStarter starter = injector.getInstance(FooStarter.class);
           starter.runApplication();
         }
       }

Settings

  • 成员变量
    private final Map<String, String> settings;

    /** The secure settings storage associated with these settings. */
    private final SecureSettings secureSettings;

    /** The first level of setting names. This is constructed lazily in {@link #names()}. */
    private final SetOnce<Set<String>> firstLevelNames = new SetOnce<>();

    /**
     * Setting names found in this Settings for both string and secure settings.
     * This is constructed lazily in {@link #keySet()}.
     */
    private final SetOnce<Set<String>> keys = new SetOnce<>();

ThreadPool

  • 构造方法
public ThreadPool(final Settings settings, final ExecutorBuilder<?>... customBuilders) {
        super(settings);

        assert Node.NODE_NAME_SETTING.exists(settings);

        final Map<String, ExecutorBuilder> builders = new HashMap<>();
        final int availableProcessors = EsExecutors.boundedNumberOfProcessors(settings);
        final int halfProcMaxAt5 = halfNumberOfProcessorsMaxFive(availableProcessors);
        final int halfProcMaxAt10 = halfNumberOfProcessorsMaxTen(availableProcessors);
        final int genericThreadPoolMax = boundedBy(4 * availableProcessors, 128, 512);
        // 初始化几种线程池
        builders.put(Names.GENERIC, new ScalingExecutorBuilder(Names.GENERIC, 4, genericThreadPoolMax, TimeValue.timeValueSeconds(30)));
        builders.put(Names.INDEX, new FixedExecutorBuilder(settings, Names.INDEX, availableProcessors, 200));
        builders.put(Names.BULK, new FixedExecutorBuilder(settings, Names.BULK, availableProcessors, 200)); // now that we reuse bulk for index/delete ops
        builders.put(Names.GET, new FixedExecutorBuilder(settings, Names.GET, availableProcessors, 1000));
        builders.put(Names.SEARCH, new FixedExecutorBuilder(settings, Names.SEARCH, searchThreadPoolSize(availableProcessors), 1000));
        builders.put(Names.MANAGEMENT, new ScalingExecutorBuilder(Names.MANAGEMENT, 1, 5, TimeValue.timeValueMinutes(5)));
        // no queue as this means clients will need to handle rejections on listener queue even if the operation succeeded
        // the assumption here is that the listeners should be very lightweight on the listeners side
        builders.put(Names.LISTENER, new FixedExecutorBuilder(settings, Names.LISTENER, halfProcMaxAt10, -1));
        builders.put(Names.FLUSH, new ScalingExecutorBuilder(Names.FLUSH, 1, halfProcMaxAt5, TimeValue.timeValueMinutes(5)));
        builders.put(Names.REFRESH, new ScalingExecutorBuilder(Names.REFRESH, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5)));
        builders.put(Names.WARMER, new ScalingExecutorBuilder(Names.WARMER, 1, halfProcMaxAt5, TimeValue.timeValueMinutes(5)));
        builders.put(Names.SNAPSHOT, new ScalingExecutorBuilder(Names.SNAPSHOT, 1, halfProcMaxAt5, TimeValue.timeValueMinutes(5)));
        builders.put(Names.FETCH_SHARD_STARTED, new ScalingExecutorBuilder(Names.FETCH_SHARD_STARTED, 1, 2 * availableProcessors, TimeValue.timeValueMinutes(5)));
        builders.put(Names.FORCE_MERGE, new FixedExecutorBuilder(settings, Names.FORCE_MERGE, 1, -1));
        builders.put(Names.FETCH_SHARD_STORE, new ScalingExecutorBuilder(Names.FETCH_SHARD_STORE, 1, 2 * availableProcessors, TimeValue.timeValueMinutes(5)));
        for (final ExecutorBuilder<?> builder : customBuilders) {
            if (builders.containsKey(builder.name())) {
                throw new IllegalArgumentException("builder with name [" + builder.name() + "] already exists");
            }
            builders.put(builder.name(), builder);
        }
        this.builders = Collections.unmodifiableMap(builders);

        threadContext = new ThreadContext(settings);

        final Map<String, ExecutorHolder> executors = new HashMap<>();
        for (@SuppressWarnings("unchecked") final Map.Entry<String, ExecutorBuilder> entry : builders.entrySet()) {
            final ExecutorBuilder.ExecutorSettings executorSettings = entry.getValue().getSettings(settings);
            final ExecutorHolder executorHolder = entry.getValue().build(executorSettings, threadContext);
            if (executors.containsKey(executorHolder.info.getName())) {
                throw new IllegalStateException("duplicate executors with name [" + executorHolder.info.getName() + "] registered");
            }
            logger.debug("created thread pool: {}", entry.getValue().formatInfo(executorHolder.info));
            executors.put(entry.getKey(), executorHolder);
        }

        executors.put(Names.SAME, new ExecutorHolder(DIRECT_EXECUTOR, new Info(Names.SAME, ThreadPoolType.DIRECT)));
        this.executors = unmodifiableMap(executors);
        // 初始化定时器
        this.scheduler = new ScheduledThreadPoolExecutor(1, EsExecutors.daemonThreadFactory(settings, "scheduler"), new EsAbortPolicy());
        this.scheduler.setExecuteExistingDelayedTasksAfterShutdownPolicy(false);
        this.scheduler.setContinueExistingPeriodicTasksAfterShutdownPolicy(false);
        this.scheduler.setRemoveOnCancelPolicy(true);

        TimeValue estimatedTimeInterval = ESTIMATED_TIME_INTERVAL_SETTING.get(settings);
        this.cachedTimeThread = new CachedTimeThread(EsExecutors.threadName(settings, "[timer]"), estimatedTimeInterval.millis());
        this.cachedTimeThread.start();
    }
  • EsExecutors
  • 线程隔离

monitor

  • jvm
  • os

ToXContent

  • XContentType:Elasticsearch支持4种数据类型,分别是JSON、SMILE、YAML、CBOR。XContentType是表示这四种数据类型的枚举类。
  • XContent:数据的抽象。因为支持4种数据类型,因此XContent有4种实现,分别是JsonXContent、SmileXContent、YamlXContent、CborXContent。
  • XContentParser:数据的解析器,4种数据类型的解析器实现分别是JsonXContentParser、SmileXContentParser、YamlXContentParser、CborXContentParser

Setting

  • 位于common.settings包下面
  • 封装了典型的东西,如默认值、解析和范围。
  • 成员变量
    private final Key key; // setting的key
    protected final Function<Settings, String> defaultValue;
    @Nullable
    private final Setting<T> fallbackSetting;
    private final Function<String, T> parser;
    private final EnumSet<Property> properties;

    private static final EnumSet<Property> EMPTY_PROPERTIES = EnumSet.noneOf(Property.class);

Settings

Environment

  • 主要职责 *