1.Container Lifecycle Hooks

1,767 阅读5分钟

1.概述

类似于许多具有组件生命周期钩子的编程语言框架,例如Angular,Kubernetes为Containers提供了生命周期钩子。 钩子使Container能够了解其管理生命周期中的事件,并在执行相应的生命周期钩子时运行在处理程序中实现的代码。

2.容器钩子(以下为官方原文)

There are two hooks that are exposed to Containers:

PostStart

This hook executes immediately after a container is created. However, there is no guarantee that the hook will execute before the container ENTRYPOINT. No parameters are passed to the handler.

PreStop

This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others. A call to the preStop hook fails if the container is already in terminated or completed state. It is blocking, meaning it is synchronous, so it must complete before the call to delete the container can be sent. No parameters are passed to the handler.

A more detailed description of the termination behavior can be found in Termination of Pods.


主要注意以下几点:

  1. PostStart hook是在容器创建(created)之后立马被调用,并且PostStart跟容器的ENTRYPOINT是异步执行的,无法保证它们之间的顺序
  2. PreStop hook是容器处于Terminated状态时立马被调用(也就是说要是Job任务的话,执行完之后其状态为completed,所以不会触发PreStop的钩子),同时PreStop是同步阻塞的,PreStop执行完才会执行删除Pod的操作

3.Hook handler execution

When a Container lifecycle management hook is called, the Kubernetes management system executes the handler in the Container registered for that hook.

Hook handler calls are synchronous within the context of the Pod containing the Container. This means that for a PostStart hook, the Container ENTRYPOINT and hook fire asynchronously. However, if the hook takes too long to run or hangs, the Container cannot reach a running state.

The behavior is similar for a PreStop hook. If the hook hangs during execution, the Pod phase stays in a Terminating state and is killed after terminationGracePeriodSeconds of pod ends. If a PostStart or PreStop hook fails, it kills the Container.

Users should make their hook handlers as lightweight as possible. There are cases, however, when long running commands make sense, such as when saving state prior to stopping a Container.


主要注意以下几点:

  1. PostStart会阻塞容器成为Running状态
  2. PreStop会阻塞容器的删除,但是过了terminationGracePeriodSeconds时间后,容器会被强制删除
  3. 如果PreStop或者PostStart失败的话, 容器会被杀死

4.Example

apiVersion: v1
kind: Pod
metadata:
  name: test-post-start
spec:
  containers:
  - name: test-post-start-container1
    image: busybox
    command: ["/bin/sh", "-c", "sleep 600"]
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "sleep 20"]
  - name: test-post-start-container
    image: busybox
    command: ["/bin/sh", "-c", "echo $(date) 'written by entrypoint' >> log.log && sleep 600"]
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo $(date) 'written by post start' >> log.log && sleep 5"]

部署这个pod之后,通过kubectl describe pod test-post-start后得到以下结果

Events:
  Type    Reason     Age   From                 Message
  ----    ------     ----  ----                 -------
  Normal  Scheduled  1m    default-scheduler    Successfully assigned default/test-post-start to node012060
  Normal  Pulling    59s   kubelet, node012060  pulling image "busybox"
  Normal  Pulled     54s   kubelet, node012060  Successfully pulled image "busybox"
  Normal  Created    54s   kubelet, node012060  Created container
  Normal  Started    54s   kubelet, node012060  Started container
  Normal  Pulling    34s   kubelet, node012060  pulling image "busybox"
  Normal  Pulled     29s   kubelet, node012060  Successfully pulled image "busybox"
  Normal  Created    29s   kubelet, node012060  Created container
  Normal  Started    29s   kubelet, node012060  Started container
  
# 进入到容器里面查看log.log
Thu Jun 13 07:44:57 UTC 2019 written by entrypoint
Thu Jun 13 07:44:57 UTC 2019 written by post start

从这个例子我们可以得到以下结论

  1. 一个Pod中容器的启动时有顺序的,排在前面容器的先启动。同时第一个容器执行完ENTRYPOINT和PostStart之后,k8s才会创建第二个容器(这样的话就可以保证第一个容器创建多长时间后再启动第二个容器)
  2. 并且PostStart跟容器的ENTRYPOINT是异步执行的,无法保证它们之间的顺序, PostStart并不会阻塞ENTRYPOINT的启动

5.钩子使用的具体例子

使用 prestop hook 保证服务安全退出

在实际生产环境中使用spring框架,由于服务更新过程中,服务容器被直接终止,部分请求仍然被分发到终止的容器,导致出现500错误,这部分错误的请求数据占比较少,也可以忽略。 考虑添加优雅的终止方式,将错误请求降到最低,直至没有错误出现。


这里介绍 spring cloud 的服务发现组件:

Eureka 是一个基于 REST 的服务,作为服务注册中心,用于定位服务来进行中间层服务器的负载均衡和故障转移。 各服务启动时,会向Eureka Server注册自己的信息(IP,端口,服务信息等),Eureka Server会存储这些信息. 微服务启动后,会周期性(默认30秒)的向Eureka Server发送心跳以续约自己的”租期”,并可以从eureka中获取其他微服务的地址信息,执行相关的逻辑

考虑现在eureka server 修改注册实例的状态,暂停服务( InstanceStatus.OUT_OF_SERVICE ),保留一段时间后,再删除服务。

禁用某个服务: curl -X PUT “http://admin:admin@192.168.101.100:8761/eureka/apps/{appName}/{instanceId}/status?value=OUT_OF_SERVICE"

说明:admin:admin是eureka的登录名和密码,如果没有,直接去掉前面这段; instanceId是上面打开的链接显示的服务列表中的标签内容,如:myapp:192.168.1.100:8080

在k8s 中的具体操作:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: NAME-service-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: NAME-service
  template:
    metadata:
      labels:
        app: NAME-service
    spec:
      containers:
      - name: NAME-service
        lifecycle:
          preStop:
            exec:
              command:
                - "/bin/sh"
                - "-c"
                - " \
                  APPLICATION=NAME-service; \
                  APPLICATION_PORT=8016; \
                  curl -s -X PUT http://eureka01-server.domain.com/eureka/apps/${APPLICATION}/$(hostname):${APPLICATION}:${APPLICATION_PORT}/status?value=OUT_OF_SERVICE; \
                  sleep 30; \
                  "

删除了无用的信息,重点关注 lifecycle 首先定义了服务名和端口的环境变量,把这部分单独作为变量,便于不同的服务进行修改。 使用 curl PUT 到eureka 配置状态为 OUT_OF_SERVICE。 配置一个sleep时间,作为服务停止缓冲时间。

有以下几个问题需要确认下

  1. pod的hook是谁发起的(也就是说谁在控制的钩子的执行)