谈谈dubbo集群容错

261 阅读3分钟

从继承图可以看出有以下几种策略:

  • Failover Cluster(失败转移) 当调用提供者的服务器发生错误时,再试下一个服务器。 用于读操作。重试有延时。设置重试次数

具体代码实现

public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance)  {
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); 
        Set<String> providers = new HashSet<String>(len);
        for (int i = 0; i < len; i++) {//5
            if (i > 0) {
                checkWhetherDestroyed();
                copyinvokers = list(invocation);
                checkInvokers(copyinvokers, invocation);
            }
            Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);//1
            invoked.add(invoker);//2
            RpcContext.getContext().setInvokers((List) invoked);
            try {
                Result result = invoker.invoke(invocation);//3
                if (le != null && logger.isWarnEnabled()) {logger.warn
                return result;
            } catch (RpcException e) {//4
                if (e.isBiz()) { throw e;
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            } finally
                providers.add(invoker.getUrl().getAddress());
            }
        }//for-end 
        throw new RpcException(le.getCode()//6
    }

实现思想:利用for循环,成功直接return。这样就可以多次调用。

每一次调用先找一个服务器,加入到已调用结合。开始调用。拿到结果。如果出错,catch住,分析异常类型。当超出规定次数,直接抛异常。

  • Failfast Cluster (快速失败) 只发起一次调用,失败立即报错。一般对应于非幂等性的写操作。比如新增。
  public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
            checkInvokers(invokers, invocation);
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            try {
                return invoker.invoke(invocation);
            } catch (Throwable e) {
                if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
                    throw (RpcException) e;
                }
                throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0,
                       
            }
        }

实现思想:直接调用。处理异常。简单

  • Failback Cluster (失败自动恢复) 失败自动恢复,后台记录失败请求,定时重发。比如消息通知
 protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
            try {
                checkInvokers(invokers, invocation);
                Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
                return invoker.invoke(invocation);
            } catch (Throwable e) {
                logger.error(
                addFailed(invocation, this);
                return new RpcResult(); // ignore
            }
        }
    private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
            if (retryFuture == null) {
                synchronized (this) {
                    if (retryFuture == null) {
                        retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {
    
                            @Override
                            public void run() {
                                // collect retry statistics
                                try {
                                    retryFailed();
                                } catch (Throwable t) { // Defensive fault tolerance
                                    logger.error("Unexpected error occur at collect statistic", t);
                                }
                            }
                        }, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
                    }
                }
            }
            failed.put(invocation, router);
        }

实现思想:先直接调用。失败后,放到map直接保存。然后用定时线程池去重试map里的信息。

  • FailSafe Cluster (失败安全) 失败了,直接忽略。比如写入日志啥的
 public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
            try {
                checkInvokers(invokers, invocation);
                Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
                return invoker.invoke(invocation);
            } catch (Throwable e) {
                logger.error("Failsafe ignore exception: " + e.getMessage(), e);
                return new RpcResult(); // ignore
            }
        }
  • Forking Cluster (并行调用) 同时调用多个服务器,有一个成功,就返回。用于实时性较高的情况
public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
            try {
                checkInvokers(invokers, invocation);
                final List<Invoker<T>> selected;
                final int forks = getUrl().getParameter(Constants.FORKS_KEY, Constants.DEFAULT_FORKS);
                final int timeout = getUrl().getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
                if (forks <= 0 || forks >= invokers.size()) {
                    selected = invokers;
                } else {
                    selected = new ArrayList<>();
                    for (int i = 0; i < forks; i++) {
                        // TODO. Add some comment here, refer chinese version for more details.
                        Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                        if (!selected.contains(invoker)) {
                            //Avoid add the same invoker several times.
                            selected.add(invoker);
                        }
                    }
                }
                RpcContext.getContext().setInvokers((List) selected);
                final AtomicInteger count = new AtomicInteger();
                final BlockingQueue<Object> ref = new LinkedBlockingQueue<>();
                for (final Invoker<T> invoker : selected) {
                    executor.execute(new Runnable() {
                        @Override
                        public void run() {
                            try {
                                Result result = invoker.invoke(invocation);
                                ref.offer(result);
                            } catch (Throwable e) {
                                int value = count.incrementAndGet();
                                if (value >= selected.size()) {
                                    ref.offer(e);
                                }
                            }
                        }
                    });
                }
                try {
                    Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
                    if (ret instanceof Throwable) {
                        Throwable e = (Throwable) ret;
                        throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
                    }
                    return (Result) ret;
                } catch (InterruptedException e) {
                    throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
                }
            } finally {
                // clear attachments which is binding to current thread.
                RpcContext.getContext().clearAttachments();
            }
        }

实现思想:利用list保存多个服务器。用线程池同时提交调用。将结果放到LinkedBlockingQueue。然后从队列里取结果。

  • Broadcast Cluster (广播调用) 广播调用所有提供者,琢一调用。任意一台报错,则报错。用于通知服务提供者更新资源啥的。
 public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
              checkInvokers(invokers, invocation);
              RpcContext.getContext().setInvokers((List) invokers);
              RpcException exception = null;
              Result result = null;
              for (Invoker<T> invoker : invokers) {
                  try {
                      result = invoker.invoke(invocation);
                  } catch (RpcException e) {
                      exception = e;
                      logger.warn(e.getMessage(), e);
                  } catch (Throwable e) {
                      exception = new RpcException(e.getMessage(), e);
                      logger.warn(e.getMessage(), e);
                  }
              }
              if (exception != null) {
                  throw exception;
              }
              return result;
          }

实现思想:for循环挨个调用。