SpringBoot Web 优雅停机源码分析

285 阅读5分钟

承接上一篇文章:SpringBoot 优雅停机,我们大概能够反推出 SpringBoot 停机两个重要的生命周期节点:

  1. WebServerGracefulShutdownLifecycle#stop():Web Server 优雅停机的节点。
  2. WebServerStartStopLifecycle#stop():Web Sever 立即停机的节点。

容器刷新时 ServletWebServerApplicationContext#onRefresh(),注册了两个 Bean,负责处理 Web Server 生命周期:WebServerGracefulShutdownLifecycle 和 WebServerStartStopLifecycle。

public class ServletWebServerApplicationContext extends GenericWebApplicationContext
       implements ConfigurableWebServerApplicationContext {
       
    @Override
    protected void onRefresh() {
        super.onRefresh();
        try {
           createWebServer();
        }
        catch (Throwable ex) {
           throw new ApplicationContextException("Unable to start web server", ex);
        }
    }
    
    private void createWebServer() {
        // ...
        if (webServer == null && servletContext == null) {
           // ...
           getBeanFactory().registerSingleton("webServerGracefulShutdown",
                 new WebServerGracefulShutdownLifecycle(this.webServer));
           getBeanFactory().registerSingleton("webServerStartStop",
                 new WebServerStartStopLifecycle(this, this.webServer));
        }
        // ...
    }

WebServerGracefulShutdownLifecycle 和 WebServerStartStopLifecycle 均实现了 SmartLifecycle 接口。查阅源码和 Debug 发现,在 DefaultLifecycleProcessor$LifecycleGroup#stop() 方法中调用了 SmartLifecycle#stop(Runnable callback) 方法。

在 LifecycleGroup#stop() 方法中,为了实现多线程之间的的同步(SmartLifecycle#stop(Runnable callback) 方法可能是异步执行,比如 Tomcat 优雅停机操作),创建了 CountDownLatch latch 对象,初始值为 this.members 中 SmartLifecycle 对象的个数。然后遍历 this.members 集合,取出每个 member 对象中的 LifecycleGroupMember member 对象,调用 DefaultLifecycleProcessor#doStop() 方法执行关闭操作。

public class DefaultLifecycleProcessor implements LifecycleProcessor, BeanFactoryAware {

    private class LifecycleGroup {
        
        private final List<LifecycleGroupMember> members = new ArrayList<>();
        private final long timeout;
        
        public void stop() {
            if (this.members.isEmpty()) {
               return;
            }
            this.members.sort(Collections.reverseOrder());
            CountDownLatch latch = new CountDownLatch(this.smartMemberCount);
            Set<String> countDownBeanNames = Collections.synchronizedSet(new LinkedHashSet<>());
            Set<String> lifecycleBeanNames = new HashSet<>(this.lifecycleBeans.keySet());
            for (LifecycleGroupMember member : this.members) {
               if (lifecycleBeanNames.contains(member.name)) {
                  doStop(this.lifecycleBeans, member.name, latch, countDownBeanNames);
               }
               else if (member.bean instanceof SmartLifecycle) {
                  // Already removed: must have been a dependent bean from another phase
                  latch.countDown();
               }
            }
            try {
               latch.await(this.timeout, TimeUnit.MILLISECONDS);
            }
            catch (InterruptedException ex) {
               Thread.currentThread().interrupt();
            }
        }
        
                
        public void add(String name, Lifecycle bean) {
            this.members.add(new LifecycleGroupMember(name, bean));
            if (bean instanceof SmartLifecycle) {
               this.smartMemberCount++;
            }
        }
        
    private class LifecycleGroupMember implements Comparable<LifecycleGroupMember> {

        private final String name;
        private final Lifecycle bean;

DefaultLifecycleProcessor#doStop() 方法中,取出 Lifecycle bean 依赖的 beans,先释放 Lifecycle bean 依赖的 beans,递归调用:doStop(lifecycleBeans, dependentBean, latch, countDownBeanNames)。

如果 bean 是 SmartLifecycle 类型,则执行其 SmartLifecycle#stop(Runnable callback) 方法,回调函数里面执行两件事:

  1. latch.countDown() 等待计数减一,减到 0 后,LifecycleGroup#stop() 方法中 latch.await(this.timeout, TimeUnit.MILLISECONDS) 方法就会放行。
  2. countDownBeanNames.remove(beanName) 标记该 beanName 已经被处理过了。

如果 bean 不是 SmartLifecycle 类型,则执行 Lifecycle#stop() 方法。

综上可以看出,latch.await(this.timeout, TimeUnit.MILLISECONDS) 超时等待只对异步执行方法生效(在 SmartLifecycle#stop(Runnable callback) 方法内,新建线程或者使用线程池执行任务),如果在 SmartLifecycle#stop(Runnable callback),Lifecycle#stop(Runnable callback) 方法中执行同步操作,超时等待没有效果。因为只能等到同步方法执行完成后,才能执行 latch.await(this.timeout, TimeUnit.MILLISECONDS) 方法进行等待。

public class DefaultLifecycleProcessor implements LifecycleProcessor, BeanFactoryAware {

    private void doStop(Map<String, ? extends Lifecycle> lifecycleBeans, final String beanName,
           final CountDownLatch latch, final Set<String> countDownBeanNames) {

        Lifecycle bean = lifecycleBeans.remove(beanName);
        if (bean != null) {
           String[] dependentBeans = getBeanFactory().getDependentBeans(beanName);
           for (String dependentBean : dependentBeans) {
              doStop(lifecycleBeans, dependentBean, latch, countDownBeanNames);
           }
           try {
              if (bean.isRunning()) {
                 if (bean instanceof SmartLifecycle) {
                    countDownBeanNames.add(beanName);
                    ((SmartLifecycle) bean).stop(() -> {
                       latch.countDown();
                       countDownBeanNames.remove(beanName);
                    });
                 }
                 else {
                    bean.stop();
                 }
              }
              else if (bean instanceof SmartLifecycle) {
                 // Don't wait for beans that aren't running...
                 latch.countDown();
              }
           }
           catch (Throwable ex) {
              logger.warn("Failed to stop bean '" + beanName + "'", ex);
           }
        }
    }

LifecycleGroup#stop() 又是谁触发的呢?答案是 DefaultLifecycleProcessor#stop() --> DefaultLifecycleProcessor#stopBeans() --> LifecycleGroup#stop()。

stopBeans() 方法中,先获取 Spring 容器中所有的 Lifecycle Beans:getLifecycleBeans(),按照 phase 值进行分组,phase 值越大的 LifecycleGroup 越早执行。

每组 phase 的停机超时时间设置为 timeoutPerShutdownPhase,假设有 n 组 phase LifecycleGroup 集合,每组都会执行 latch.await(this.timeout, TimeUnit.MILLISECONDS) 进行等待,每组的超时时间为 timeoutPerShutdownPhase,理论上 SpringBoot 优雅停机最长的超时时间为 n * timeoutPerShutdownPhase。

public class DefaultLifecycleProcessor implements LifecycleProcessor, BeanFactoryAware {

    private volatile long timeoutPerShutdownPhase = 30000;

    @Override
    public void stop() {
        stopBeans();
        this.running = false;
    }

    private void stopBeans() {
        Map<String, Lifecycle> lifecycleBeans = getLifecycleBeans();
        Map<Integer, LifecycleGroup> phases = new HashMap<>();
        lifecycleBeans.forEach((beanName, bean) -> {
           int shutdownPhase = getPhase(bean);
           LifecycleGroup group = phases.get(shutdownPhase);
           if (group == null) {
              group = new LifecycleGroup(shutdownPhase, this.timeoutPerShutdownPhase, lifecycleBeans, false);
              phases.put(shutdownPhase, group);
           }
           group.add(beanName, bean);
        });
        if (!phases.isEmpty()) {
           List<Integer> keys = new ArrayList<>(phases.keySet());
           keys.sort(Collections.reverseOrder());
           for (Integer key : keys) {
              phases.get(key).stop();
           }
        }
    }

那 DefaultLifecycleProcessor#stop() 方法又是何时执行的呢?在 AbstractApplicationContext 中,关闭容器时,触发了 DefaultLifecycleProcessor#stop():

  1. DefaultLifecycleProcessor 的初始化时机:如果容器中没有指定名称 lifecycleProcessor bean,则创建 DefaultLifecycleProcessor 对象,否则从容器中获取指定名称的 lifecycleProcessor。
  2. DefaultLifecycleProcessor 的执行时机:容器关闭时会调用 AbstractApplicationContext#doClose() --> this.lifecycleProcessor.onClose() --> DefaultLifecycleProcessor#stop()。
public abstract class AbstractApplicationContext extends DefaultResourceLoader
		implements ConfigurableApplicationContext {
    
    // Name of the LifecycleProcessor bean in the factory. If none is supplied, a DefaultLifecycleProcessor is used.
    public static final String LIFECYCLE_PROCESSOR_BEAN_NAME = "lifecycleProcessor";
    
    // LifecycleProcessor for managing the lifecycle of beans within this context.
    @Nullable
    private LifecycleProcessor lifecycleProcessor;
    // Initialize the LifecycleProcessor. Uses DefaultLifecycleProcessor if none defined in the context.
    protected void initLifecycleProcessor() {
        ConfigurableListableBeanFactory beanFactory = getBeanFactory();
        if (beanFactory.containsLocalBean(LIFECYCLE_PROCESSOR_BEAN_NAME)) {
           this.lifecycleProcessor =
                 beanFactory.getBean(LIFECYCLE_PROCESSOR_BEAN_NAME, LifecycleProcessor.class);
        }
        else {
           DefaultLifecycleProcessor defaultProcessor = new DefaultLifecycleProcessor();
           defaultProcessor.setBeanFactory(beanFactory);
           this.lifecycleProcessor = defaultProcessor;
           beanFactory.registerSingleton(LIFECYCLE_PROCESSOR_BEAN_NAME, this.lifecycleProcessor);
        }
    }

    // Actually performs context closing: publishes a ContextClosedEvent and destroys the singletons in the bean factory of this application context.
    protected void doClose() {
        // ...

       // Stop all Lifecycle beans, to avoid delays during individual destruction.
       if (this.lifecycleProcessor != null) {
          try {
             this.lifecycleProcessor.onClose();
          }
          catch (Throwable ex) {
             logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
          }
       }
       // ...
   }

DefaultLifecycleProcessor 默认的超时时间:在 LifecycleAutoConfiguration 自动配置类中,创建了 DefaultLifecycleProcessor Bean 对象,设置超时时间为 properties.getTimeoutPerShutdownPhase().toMillis(),默认值为 30s。

@AutoConfiguration
@EnableConfigurationProperties(LifecycleProperties.class)
public class LifecycleAutoConfiguration {

    @Bean(name = AbstractApplicationContext.LIFECYCLE_PROCESSOR_BEAN_NAME)
    @ConditionalOnMissingBean(name = AbstractApplicationContext.LIFECYCLE_PROCESSOR_BEAN_NAME,
          search = SearchStrategy.CURRENT)
    public DefaultLifecycleProcessor defaultLifecycleProcessor(LifecycleProperties properties) {
       DefaultLifecycleProcessor lifecycleProcessor = new DefaultLifecycleProcessor();
       lifecycleProcessor.setTimeoutPerShutdownPhase(properties.getTimeoutPerShutdownPhase().toMillis());
       return lifecycleProcessor;
    }


@ConfigurationProperties(prefix = "spring.lifecycle")
public class LifecycleProperties {

    /**
     * Timeout for the shutdown of any phase (group of SmartLifecycle beans with the same
     * 'phase' value).
     */
    private Duration timeoutPerShutdownPhase = Duration.ofSeconds(30);

在 application.yml 文件中配置 SpringBoot 优雅停机的超时时间:

spring:
  lifecycle:
    timeoutPerShutdownPhase: 30s

再来看看 Tomcat 优雅停机的代码:WebServerGracefulShutdownLifecycle#stop(Runnable callback) 中调用 this.webServer.shutDownGracefully((result) -> callback.run()) 优雅停止 Tomcat Server。

  1. 如果没有配置 gracefulShutdown,TomcatWebServer#shutDownGracefully() 方法会立即返回
  2. 如果配置了 gracefulShutdown,TomcatWebServer#shutDownGracefully() 方法会等待 Connectors 优雅关闭,而不是立即停止 Tomcat Server。优点是:用户不会收到奇奇怪怪的前端报错。

当 WebServerGracefulShutdownLifecycle#stop(Runnable callback) 执行完成后,会调用 callback.run() 方法,就会执行 latch.countDown()。当所有 SmartLifecycle Beans 的 stop(Runnable callback) 方法执行完毕后,latch.await(this.timeout, TimeUnit.MILLISECONDS) 不再阻塞。

public final class WebServerGracefulShutdownLifecycle implements SmartLifecycle {

    public static final int SMART_LIFECYCLE_PHASE = SmartLifecycle.DEFAULT_PHASE;
    private final WebServer webServer;
    private volatile boolean running;

    @Override
    public void stop(Runnable callback) {
       this.running = false;
       this.webServer.shutDownGracefully((result) -> callback.run());
    }

public class TomcatWebServer implements WebServer {

    private final GracefulShutdown gracefulShutdown;

    @Override
    public void shutDownGracefully(GracefulShutdownCallback callback) {
        if (this.gracefulShutdown == null) {
           callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE);
           return;
        }
        this.gracefulShutdown.shutDownGracefully(callback);
    }
   
 if (bean instanceof SmartLifecycle) {
    countDownBeanNames.add(beanName);
    ((SmartLifecycle) bean).stop(() -> {
       latch.countDown();
       countDownBeanNames.remove(beanName);
    });
 }

WebServerStartStopLifecycle#stop() 中会调用 TomcatWebServer#stop() 方法立即停止 Tomcat Server。

public interface SmartLifecycle extends Lifecycle, Phased {
    default void stop(Runnable callback) {
        stop();
        callback.run();
    }
    
class WebServerStartStopLifecycle implements SmartLifecycle {

    @Override
    public void stop() {
        this.running = false;
        this.weServerManager.stop();
    }
    
public class TomcatWebServer implements WebServer {

    @Override
    public void stop() throws WebServerException {
        synchronized (this.monitor) {
           boolean wasStarted = this.started;
           try {
              this.started = false;
              try {
                 if (this.gracefulShutdown != null) {
                    this.gracefulShutdown.abort();
                 }
                 stopTomcat();
                 this.tomcat.destroy();
              }
              catch (LifecycleException ex) {
                 // swallow and continue
              }
           }
           catch (Exception ex) {
              throw new WebServerException("Unable to stop embedded Tomcat", ex);
           }
           finally {
              if (wasStarted) {
                 containerCounter.decrementAndGet();
              }
           }
        }
    }

WebServerGracefulShutdownLifecycle 和 WebServerStartStopLifecycle getPhase() 方法返回的 phase 值不同,执行 SmartLifecycle Bean 的 stop(Runnable callback) 方法时,会先按照 phase 进行倒序排序,SmartLifecycle Bean 的 phase 值越高,越先执行 stop(Runnable callback) 方法。

因此 SpringBoot 优雅停机时,会先执行 WebServerGracefulShutdownLifecycle#stop(Runnable callback) 方法,然后等待 timeoutPerShutdownPhase 毫秒,确保 Tomcat Server 优雅停机后,再执行 WebServerStartStopLifecycle#stop(Runnable callback) 方法,关闭 Tomcat Server。

public final class WebServerGracefulShutdownLifecycle implements SmartLifecycle {
    // int DEFAULT_PHASE = Integer.MAX_VALUE;
    public static final int SMART_LIFECYCLE_PHASE = SmartLifecycle.DEFAULT_PHASE;
    
    @Override
    public int getPhase() {
        return SMART_LIFECYCLE_PHASE;
    }
    
class WebServerStartStopLifecycle implements SmartLifecycle {

    @Override
    public int getPhase() {
        return Integer.MAX_VALUE - 1;
    }

image.png