Nacos GRPC通信模块分析

3,654 阅读10分钟

Nacos通信模块分析

Server

整体概览

在2.0版本中Nacos采用了Grpc作为通信模块,BaseRpcServer定义了基本的服务启动关闭的接口,BaseGrpcServer则实现了基本的Server模块功能,GrpcClusterServer用于集群节点之间的交互,GrpcSdkServer用于客户端与服务器的交互,分别定义了各自的执行线程池。

image-20210726162723732.png

BaseGrpcServer

BaseGrpcServer利用Grpc实现了远程Server端点的功能,首先初始化了一个服务调用的拦截器,在这个拦截器中获取到远程调用的属性放入线程上下文中,注意的是这里的getInternalChannel是通过反射获取到了Netty Channel在grpc的一个封装。

ServerInterceptor serverInterceptor = new ServerInterceptor() {
    @Override
    public <T, S> ServerCall.Listener<T> interceptCall(ServerCall<T, S> call, Metadata headers,
            ServerCallHandler<T, S> next) {
        Context ctx = Context.current()
                .withValue(CONTEXT_KEY_CONN_ID, call.getAttributes().get(TRANS_KEY_CONN_ID))
                .withValue(CONTEXT_KEY_CONN_REMOTE_IP, call.getAttributes().get(TRANS_KEY_REMOTE_IP))
                .withValue(CONTEXT_KEY_CONN_REMOTE_PORT, call.getAttributes().get(TRANS_KEY_REMOTE_PORT))
                .withValue(CONTEXT_KEY_CONN_LOCAL_PORT, call.getAttributes().get(TRANS_KEY_LOCAL_PORT));
        if (REQUEST_BI_STREAM_SERVICE_NAME.equals(call.getMethodDescriptor().getServiceName())) {
            Channel internalChannel = getInternalChannel(call);
            ctx = ctx.withValue(CONTEXT_KEY_CHANNEL, internalChannel);
        }
        return Contexts.interceptCall(ctx, call, headers, next);
    }
};

接着就是初始化方法的处理,不是采用传统的定义proto的形式,而是采用手工注册的方式,支持一元调用(一次往返的请求)以及双向流式调用。其中传输的请求体与响应体为Payload类,是由protobuf定义生成而来的,利用了any类型来支持多类型数据格式。

private void addServices(MutableHandlerRegistry handlerRegistry, ServerInterceptor... serverInterceptor) {
    
    // unary common call register.
    //定义Method
    final MethodDescriptor<Payload, Payload> unaryPayloadMethod = MethodDescriptor.<Payload, Payload>newBuilder()
            .setType(MethodDescriptor.MethodType.UNARY)
            .setFullMethodName(MethodDescriptor.generateFullMethodName(REQUEST_SERVICE_NAME, REQUEST_METHOD_NAME))
            .setRequestMarshaller(ProtoUtils.marshaller(Payload.getDefaultInstance()))
            .setResponseMarshaller(ProtoUtils.marshaller(Payload.getDefaultInstance())).build();
    
    //定义服务处理方法回调
    final ServerCallHandler<Payload, Payload> payloadHandler = ServerCalls
            .asyncUnaryCall((request, responseObserver) -> {
                grpcCommonRequestAcceptor.request(request, responseObserver);
            });
    
    //定义Servie
    final ServerServiceDefinition serviceDefOfUnaryPayload = ServerServiceDefinition.builder(REQUEST_SERVICE_NAME)
            .addMethod(unaryPayloadMethod, payloadHandler).build();
    handlerRegistry.addService(ServerInterceptors.intercept(serviceDefOfUnaryPayload, serverInterceptor));
    
    // bi stream register.
    final ServerCallHandler<Payload, Payload> biStreamHandler = ServerCalls.asyncBidiStreamingCall(
            (responseObserver) -> grpcBiStreamRequestAcceptor.requestBiStream(responseObserver));
    
    final MethodDescriptor<Payload, Payload> biStreamMethod = MethodDescriptor.<Payload, Payload>newBuilder()
            .setType(MethodDescriptor.MethodType.BIDI_STREAMING).setFullMethodName(MethodDescriptor
                    .generateFullMethodName(REQUEST_BI_STREAM_SERVICE_NAME, REQUEST_BI_STREAM_METHOD_NAME))
            .setRequestMarshaller(ProtoUtils.marshaller(Payload.newBuilder().build()))
            .setResponseMarshaller(ProtoUtils.marshaller(Payload.getDefaultInstance())).build();
    
    final ServerServiceDefinition serviceDefOfBiStream = ServerServiceDefinition
            .builder(REQUEST_BI_STREAM_SERVICE_NAME).addMethod(biStreamMethod, biStreamHandler).build();
    handlerRegistry.addService(ServerInterceptors.intercept(serviceDefOfBiStream, serverInterceptor));
    
}

最后则是设置相关的Server属性以及对连接的管理,这里的transportReady会在连接建立时根据当前的连接的属性构造一个新的Attributes,主要是为每个连接建议一个connectionId,和前面提到的拦截器相呼应。并在连接断开时从connectionManager移除连接。

server = ServerBuilder.forPort(getServicePort()).executor(getRpcExecutor())
        .maxInboundMessageSize(getInboundMessageSize()).fallbackHandlerRegistry(handlerRegistry)
        .compressorRegistry(CompressorRegistry.getDefaultInstance())
        .decompressorRegistry(DecompressorRegistry.getDefaultInstance())
        .addTransportFilter(new ServerTransportFilter() {
            @Override
            public Attributes transportReady(Attributes transportAttrs) {
                InetSocketAddress remoteAddress = (InetSocketAddress) transportAttrs
                        .get(Grpc.TRANSPORT_ATTR_REMOTE_ADDR);
                InetSocketAddress localAddress = (InetSocketAddress) transportAttrs
                        .get(Grpc.TRANSPORT_ATTR_LOCAL_ADDR);
                int remotePort = remoteAddress.getPort();
                int localPort = localAddress.getPort();
                String remoteIp = remoteAddress.getAddress().getHostAddress();
                Attributes attrWrapper = transportAttrs.toBuilder()
                        .set(TRANS_KEY_CONN_ID, System.currentTimeMillis() + "_" + remoteIp + "_" + remotePort)
                        .set(TRANS_KEY_REMOTE_IP, remoteIp).set(TRANS_KEY_REMOTE_PORT, remotePort)
                        .set(TRANS_KEY_LOCAL_PORT, localPort).build();
                String connectionId = attrWrapper.get(TRANS_KEY_CONN_ID);
                Loggers.REMOTE_DIGEST.info("Connection transportReady,connectionId = {} ", connectionId);
                return attrWrapper;
                
            }
            
            @Override
            public void transportTerminated(Attributes transportAttrs) {
                String connectionId = null;
                try {
                    connectionId = transportAttrs.get(TRANS_KEY_CONN_ID);
                } catch (Exception e) {
                    // Ignore
                }
                if (StringUtils.isNotBlank(connectionId)) {
                    Loggers.REMOTE_DIGEST
                            .info("Connection transportTerminated,connectionId = {} ", connectionId);
                    connectionManager.unregister(connectionId);
                }
            }
        }).build();

ConnectionManager

ConnectionManager用于管理与Server之间建立的长连接,在Nacos中,只有建立了双向流的连接才会被纳入到管理之中,Nacos本身也是利用双向流来完成Server端的主动推送。与Sentinel/Seata类似的,都是采用ConcurrentHashMap来保存对应的Connection连接对象,在注册连接时还会检查当前的连接相关的限制,对客户端进行一个统计计数,并调用事件回调通知监听器有连接建立。注意这里使用了synchronized的关键字,也就说明回调函数实际上是串行执行的(举个例子,RpcAckCallbackInitorOrCleaner会在连接建立/断开时初始化/清理RpcAckCallbackSynchronizer中连接对应的Map Entry)

public synchronized boolean register(String connectionId, Connection connection) {
    
    if (connection.isConnected()) {
        if (connections.containsKey(connectionId)) {
            return true;
        }
        if (!checkLimit(connection)) {
            return false;
        }
        if (traced(connection.getMetaInfo().clientIp)) {
            connection.setTraced(true);
        }
        connections.put(connectionId, connection);
        connectionForClientIp.get(connection.getMetaInfo().clientIp).getAndIncrement();
        
        clientConnectionEventListenerRegistry.notifyClientConnected(connection);
        Loggers.REMOTE_DIGEST
                .info("new connection registered successfully, connectionId = {},connection={} ", connectionId,
                        connection);
        return true;
        
    }
    return false;
    
}

同时,类似的ConnectionManager中还启动了一个定时任务,一方面是用来检查连接限制,利用ConnectionReset请求来通知Client关闭掉超限的连接,另一方面是周期性的扫描并激活空闲连接。重点关注一下空闲连接探活的流程,这里使用了一个CountDownLatch,异步的发送客户端检测请求,当响应时调用countDown,检测线程阻塞一定时间等待异步任务执行完,当阻塞时间到达后,再关闭没有响应的连接。

RpcScheduledExecutor.COMMON_SERVER_EXECUTOR.scheduleWithFixedDelay(new Runnable() {
    @Override
    public void run() {
        try {
            //.....  
          
            //4.client active detection.
            Loggers.REMOTE_DIGEST.info("Out dated connection ,size={}", outDatedConnections.size());
            if (CollectionUtils.isNotEmpty(outDatedConnections)) {
                Set<String> successConnections = new HashSet<>();
                final CountDownLatch latch = new CountDownLatch(outDatedConnections.size());
                for (String outDateConnectionId : outDatedConnections) {
                    try {
                        Connection connection = getConnection(outDateConnectionId);
                        if (connection != null) {
                            ClientDetectionRequest clientDetectionRequest = new ClientDetectionRequest();
                            connection.asyncRequest(clientDetectionRequest, new RequestCallBack() {
                                @Override
                                public Executor getExecutor() {
                                    return null;
                                }
                                
                                @Override
                                public long getTimeout() {
                                    return 1000L;
                                }
                                
                                @Override
                                public void onResponse(Response response) {
                                    latch.countDown();
                                    if (response != null && response.isSuccess()) {
                                        connection.freshActiveTime();
                                        successConnections.add(outDateConnectionId);
                                    }
                                }
                                
                                @Override
                                public void onException(Throwable e) {
                                    latch.countDown();
                                }
                            });
                            
                            Loggers.REMOTE_DIGEST
                                    .info("[{}]send connection active request ", outDateConnectionId);
                        } else {
                            latch.countDown();
                        }
                        
                    } catch (ConnectionAlreadyClosedException e) {
                        latch.countDown();
                    } catch (Exception e) {
                        Loggers.REMOTE_DIGEST
                                .error("[{}]Error occurs when check client active detection ,error={}",
                                        outDateConnectionId, e);
                        latch.countDown();
                    }
                }
                
                latch.await(3000L, TimeUnit.MILLISECONDS);
                Loggers.REMOTE_DIGEST
                        .info("Out dated connection check successCount={}", successConnections.size());
                
                for (String outDateConnectionId : outDatedConnections) {
                    if (!successConnections.contains(outDateConnectionId)) {
                        Loggers.REMOTE_DIGEST
                                .info("[{}]Unregister Out dated connection....", outDateConnectionId);
                        unregister(outDateConnectionId);
                    }
                }
            }
            
            //reset loader client
            
            if (isLoaderClient) {
                loadClient = -1;
                redirectAddress = null;
            }
            
            Loggers.REMOTE_DIGEST.info("Connection check task end");
            
        } catch (Throwable e) {
            Loggers.REMOTE.error("Error occurs during connection check... ", e);
        }
    }
}, 1000L, 3000L, TimeUnit.MILLISECONDS);

GrpcBiStreamRequestAcceptor

GrpcBiStreamRequestAcceptor则是前面提到的双向流的消息处理器。Client与Server首先会通过双向流的Service建立连接,只有建立了连接的客户端才可以发起一元单向请求。从这个消息处理器中我们不难发现,客户端实际上只会通过这个双向流发送连接建立的请求,而后这个流都用来服务端推送消息。其余的请求都是通过另一个一元单向请求来完成的。在注册连接时,GrpcConnection中还注入了当前连接的responseObserver,这样服务端就可以通过ConnectionManager获取到连接然后主动进行数据的推送。

@Override
public void onNext(Payload payload) {
    
    clientIp = payload.getMetadata().getClientIp();
    traceDetailIfNecessary(payload);
    
    Object parseObj;
    try {
        parseObj = GrpcUtils.parse(payload);
    } catch (Throwable throwable) {
        Loggers.REMOTE_DIGEST
                .warn("[{}]Grpc request bi stream,payload parse error={}", connectionId, throwable);
        return;
    }
    
    if (parseObj == null) {
        Loggers.REMOTE_DIGEST
                .warn("[{}]Grpc request bi stream,payload parse null ,body={},meta={}", connectionId,
                        payload.getBody().getValue().toStringUtf8(), payload.getMetadata());
        return;
    }
    if (parseObj instanceof ConnectionSetupRequest) {
        ConnectionSetupRequest setUpRequest = (ConnectionSetupRequest) parseObj;
        Map<String, String> labels = setUpRequest.getLabels();
        String appName = "-";
        if (labels != null && labels.containsKey(Constants.APPNAME)) {
            appName = labels.get(Constants.APPNAME);
        }
        
        ConnectionMeta metaInfo = new ConnectionMeta(connectionId, payload.getMetadata().getClientIp(),
                remoteIp, remotePort, localPort, ConnectionType.GRPC.getType(),
                setUpRequest.getClientVersion(), appName, setUpRequest.getLabels());
        metaInfo.setTenant(setUpRequest.getTenant());
        Connection connection = new GrpcConnection(metaInfo, responseObserver, CONTEXT_KEY_CHANNEL.get());
        connection.setAbilities(setUpRequest.getAbilities());
        boolean rejectSdkOnStarting = metaInfo.isSdkSource() && !ApplicationUtils.isStarted();
        
        if (rejectSdkOnStarting || !connectionManager.register(connectionId, connection)) {
            //Not register to the connection manager if current server is over limit or server is starting.
            try {
                Loggers.REMOTE_DIGEST.warn("[{}]Connection register fail,reason:{}", connectionId,
                        rejectSdkOnStarting ? " server is not started" : " server is over limited.");
                connection.request(new ConnectResetRequest(), 3000L);
                connection.close();
            } catch (Exception e) {
                //Do nothing.
                if (connectionManager.traced(clientIp)) {
                    Loggers.REMOTE_DIGEST
                            .warn("[{}]Send connect reset request error,error={}", connectionId, e);
                }
            }
        }
        
    } else if (parseObj instanceof Response) {
        Response response = (Response) parseObj;
        if (connectionManager.traced(clientIp)) {
            Loggers.REMOTE_DIGEST
                    .warn("[{}]Receive response of server request  ,response={}", connectionId, response);
        }
        RpcAckCallbackSynchronizer.ackNotify(connectionId, response);
        connectionManager.refreshActiveTime(connectionId);
    } else {
        Loggers.REMOTE_DIGEST
                .warn("[{}]Grpc request bi stream,unknown payload receive ,parseObj={}", connectionId,
                        parseObj);
    }
    
}

若接收到的数据是客户度的响应的话,则会刷新对应连接的激活时间。注意到这边使用了RpcAckCallbackSynchronizer来处理客户端侧的响应,这里的原理实际上与Sentinel、Seata中的同步发送中所采用的机制十分相似,都是利用一个map来保存需要接收结果的发送请求,当响应收到时找到对应的发送任务Future,设置任务执行结果。不同的是,在Nacos中这个map用的是ConcurrentLinkedHashMap提供的EvictionListener来处理过期任务。

public static void ackNotify(String connectionId, Response response) {
    
    Map<String, DefaultRequestFuture> stringDefaultPushFutureMap = CALLBACK_CONTEXT.get(connectionId);
    if (stringDefaultPushFutureMap == null) {
        
        Loggers.REMOTE_DIGEST
                .warn("Ack receive on a outdated connection ,connection id={},requestId={} ", connectionId,
                        response.getRequestId());
        return;
    }
    
    DefaultRequestFuture currentCallback = stringDefaultPushFutureMap.remove(response.getRequestId());
    if (currentCallback == null) {
        
        Loggers.REMOTE_DIGEST
                .warn("Ack receive on a outdated request ,connection id={},requestId={} ", connectionId,
                        response.getRequestId());
        return;
    }
    
    if (response.isSuccess()) {
        currentCallback.setResponse(response);
    } else {
        currentCallback.setFailResult(new NacosException(response.getErrorCode(), response.getMessage()));
    }
}

GrpcRequestAcceptor

如前面所述,客户端发起的请求都是由一元方法来完成处理的,核心的代码则是根据Request中的Metadata中的type字段获取到RequestHandler,RequestHandler处理完之后注意会调用responseObserver.onCompleted()方法,因为这个方法对应的是一元请求。

@Override
public void request(Payload grpcRequest, StreamObserver<Payload> responseObserver) {
    
    traceIfNecessary(grpcRequest, true);
    String type = grpcRequest.getMetadata().getType();
    //.....
    
    RequestHandler requestHandler = requestHandlerRegistry.getByRequestType(type);
    //...
    
    //check connection status.
    String connectionId = CONTEXT_KEY_CONN_ID.get();
    boolean requestValid = connectionManager.checkValid(connectionId);
    if (!requestValid) {
 				//.....
        return;
    }
    
    //...
    
    Request request = (Request) parseObj;
    try {
        Connection connection = connectionManager.getConnection(CONTEXT_KEY_CONN_ID.get());
        RequestMeta requestMeta = new RequestMeta();
        requestMeta.setClientIp(connection.getMetaInfo().getClientIp());
        requestMeta.setConnectionId(CONTEXT_KEY_CONN_ID.get());
        requestMeta.setClientVersion(connection.getMetaInfo().getVersion());
        requestMeta.setLabels(connection.getMetaInfo().getLabels());
        connectionManager.refreshActiveTime(requestMeta.getConnectionId());
        Response response = requestHandler.handleRequest(request, requestMeta);
        Payload payloadResponse = GrpcUtils.convert(response);
        traceIfNecessary(payloadResponse, false);
        responseObserver.onNext(payloadResponse);
        responseObserver.onCompleted();
    } catch (Throwable e) {
        Loggers.REMOTE_DIGEST
                .error("[{}] Fail to handle request from connection [{}] ,error message :{}", "grpc", connectionId,
                        e);
        Payload payloadResponse = GrpcUtils.convert(buildErrorResponse(
                (e instanceof NacosException) ? ((NacosException) e).getErrCode() : ResponseCode.FAIL.getCode(),
                e.getMessage()));
        traceIfNecessary(payloadResponse, false);
        responseObserver.onNext(payloadResponse);
        responseObserver.onCompleted();
    }
    
}

RequestHandler实际上是一个抽象类,在调用实际的方法时会遍历所有的请求过滤器,例如nacos加入了一个TpsControlRequestFilter用来对方法的请求进行限流。

public Response handleRequest(T request, RequestMeta meta) throws NacosException {
    for (AbstractRequestFilter filter : requestFilters.filters) {
        try {
            Response filterResult = filter.filter(request, meta, this.getClass());
            if (filterResult != null && !filterResult.isSuccess()) {
                return filterResult;
            }
        } catch (Throwable throwable) {
            Loggers.REMOTE.error("filter error", throwable);
        }
        
    }
    return handle(request, meta);
}

RpcPushService

RpcPushService用于服务端向客户端推送消息,结合前面的ConnectionManager以及GrpcBiStreamRequestAcceptor,Server可以通过connectionId找到对应的GrpcConnection,然后调用GrpcConnection的发送方法,当收到响应后,RpcAckCallbackSynchronizer会设置结果并由DefaultRequestFuture回调设置的回调方法。

public void pushWithCallback(String connectionId, ServerRequest request, PushCallBack requestCallBack,
        Executor executor) {
    Connection connection = connectionManager.getConnection(connectionId);
    if (connection != null) {
        try {
            connection.asyncRequest(request, new AbstractRequestCallBack(requestCallBack.getTimeout()) {
                
                @Override
                public Executor getExecutor() {
                    return executor;
                }
                
                @Override
                public void onResponse(Response response) {
                    if (response.isSuccess()) {
                        requestCallBack.onSuccess();
                    } else {
                        requestCallBack.onFail(new NacosException(response.getErrorCode(), response.getMessage()));
                    }
                }
                
                @Override
                public void onException(Throwable e) {
                    requestCallBack.onFail(e);
                }
            });
        } catch (ConnectionAlreadyClosedException e) {
            connectionManager.unregister(connectionId);
            requestCallBack.onSuccess();
        } catch (Exception e) {
            Loggers.REMOTE_DIGEST
                    .error("error to send push response to connectionId ={},push response={}", connectionId,
                            request, e);
            requestCallBack.onFail(e);
        }
    } else {
        requestCallBack.onSuccess();
    }
}

同步发送异步发送最终都会调用sendRequestInner来发送消息,在这个函数里首先会构造一个DefaultRequestFuture,DefaultRequestFuture包含两个回调函数,一个是业务的回调函数,一个是超时的回调函数。默认超时会清理掉这个DefaultRequestFuture。随后调用sendRequestNoAck通过streamObserver利用双向流推送请求。

private DefaultRequestFuture sendRequestInner(Request request, RequestCallBack callBack) throws NacosException {
    final String requestId = String.valueOf(PushAckIdGenerator.getNextId());
    request.setRequestId(requestId);
    
    DefaultRequestFuture defaultPushFuture = new DefaultRequestFuture(getMetaInfo().getConnectionId(), requestId,
            callBack, () -> RpcAckCallbackSynchronizer.clearFuture(getMetaInfo().getConnectionId(), requestId));
    
    RpcAckCallbackSynchronizer.syncCallback(getMetaInfo().getConnectionId(), requestId, defaultPushFuture);
    sendRequestNoAck(request);
    return defaultPushFuture;
}

Client

整体概览

在客户端侧,整体层次与服务端侧是类似的,不同的是在RpcClient中不仅定义了接口,还实现了诸多的功能,例如消息发送、服务器切换等。GrpcClient负责利用Grpc来实现服务器连接、消息处理等功能。

image-20210726181535929.png

RpcClient

首先RpcClient包含了一个clientEventExecutor用于执行客户端的任务,在启动时会提交一个连接事件的监听线程用于触发连接事件监听回调函数,一个重连事件的监听线程用于定时检查远程服务器的连通性以及重连事件的处理。随后即根据配置开始连接远程服务器,定义了一个模板方法connectToServer由具体的实现类GrpcClient来负责。最后注册两个消息处理器,用于处理服务器发送来的请求。

连接服务器之前会首先调用nextRpcServer来获取Server列表,Server列表维护在ServerListManager中,使用一个List维护列表以及currentIndex原子变量来维护当前索引,如果提供了远程获取Server列表的接口,在ServerListManager中还会定期的对服务器的列表进行刷新。

public final void start() throws NacosException {
    
    boolean success = rpcClientStatus.compareAndSet(RpcClientStatus.INITIALIZED, RpcClientStatus.STARTING);
    if (!success) {
        return;
    }
    
    clientEventExecutor = new ScheduledThreadPoolExecutor(2, new ThreadFactory() {
        @Override
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r);
            t.setName("com.alibaba.nacos.client.remote.worker");
            t.setDaemon(true);
            return t;
        }
    });
    
    // connection event consumer.
    //连接事件处理
    clientEventExecutor.submit(new Runnable() {
        @Override
        public void run() {
            while (!clientEventExecutor.isTerminated() && !clientEventExecutor.isShutdown()) {
                ConnectionEvent take = null;
                try {
                    take = eventLinkedBlockingQueue.take();
                    if (take.isConnected()) {
                        notifyConnected();
                    } else if (take.isDisConnected()) {
                        notifyDisConnected();
                    }
                } catch (Throwable e) {
                    //Do nothing
                }
            }
        }
    });
    
    //健康检查
    clientEventExecutor.submit(new Runnable() {
        @Override
        public void run() {
            while (true) {
                try {
                    if (isShutdown()) {
                        break;
                    }
                    ReconnectContext reconnectContext = reconnectionSignal
                            .poll(keepAliveTime, TimeUnit.MILLISECONDS);
                    if (reconnectContext == null) {
                        //check alive time.
                        if (System.currentTimeMillis() - lastActiveTimeStamp >= keepAliveTime) {
                            boolean isHealthy = healthCheck();
                            if (!isHealthy) {
                                if (currentConnection == null) {
                                    continue;
                                }
                                LoggerUtils.printIfInfoEnabled(LOGGER,
                                        "[{}]Server healthy check fail,currentConnection={}", name,
                                        currentConnection.getConnectionId());
                                
                                RpcClientStatus rpcClientStatus = RpcClient.this.rpcClientStatus.get();
                                if (RpcClientStatus.SHUTDOWN.equals(rpcClientStatus)) {
                                    break;
                                }
                                
                                boolean success = RpcClient.this.rpcClientStatus
                                        .compareAndSet(rpcClientStatus, RpcClientStatus.UNHEALTHY);
                                if (success) {
                                    reconnectContext = new ReconnectContext(null, false);
                                } else {
                                    continue;
                                }
                                
                            } else {
                                lastActiveTimeStamp = System.currentTimeMillis();
                                continue;
                            }
                        } else {
                            continue;
                        }
                        
                    }
                    
                    if (reconnectContext.serverInfo != null) {
                        //clear recommend server if server is not in server list.
                        boolean serverExist = false;
                        for (String server : getServerListFactory().getServerList()) {
                            ServerInfo serverInfo = resolveServerInfo(server);
                            if (serverInfo.getServerIp().equals(reconnectContext.serverInfo.getServerIp())) {
                                serverExist = true;
                                reconnectContext.serverInfo.serverPort = serverInfo.serverPort;
                                break;
                            }
                        }
                        if (!serverExist) {
                            LoggerUtils.printIfInfoEnabled(LOGGER,
                                    "[{}] Recommend server is not in server list ,ignore recommend server {}", name,
                                    reconnectContext.serverInfo.getAddress());
                            
                            reconnectContext.serverInfo = null;
                            
                        }
                    }
                    reconnect(reconnectContext.serverInfo, reconnectContext.onRequestFail);
                } catch (Throwable throwable) {
                    //Do nothing
                }
            }
        }
    });
    
    //connect to server ,try to connect to server sync once, async starting if fail.
    Connection connectToServer = null;
    rpcClientStatus.set(RpcClientStatus.STARTING);
    
    int startUpRetryTimes = RETRY_TIMES;
    while (startUpRetryTimes > 0 && connectToServer == null) {
        try {
            startUpRetryTimes--;
            ServerInfo serverInfo = nextRpcServer();
            
            LoggerUtils.printIfInfoEnabled(LOGGER, "[{}] Try to connect to server on start up, server: {}", name,
                    serverInfo);
            //连接到远程服务器
            connectToServer = connectToServer(serverInfo);
        } catch (Throwable e) {
            LoggerUtils.printIfWarnEnabled(LOGGER,
                    "[{}]Fail to connect to server on start up, error message={}, start up retry times left: {}",
                    name, e.getMessage(), startUpRetryTimes);
        }
        
    }
    
    if (connectToServer != null) {
        LoggerUtils.printIfInfoEnabled(LOGGER, "[{}] Success to connect to server [{}] on start up,connectionId={}",
                name, connectToServer.serverInfo.getAddress(), connectToServer.getConnectionId());
        this.currentConnection = connectToServer;
        rpcClientStatus.set(RpcClientStatus.RUNNING);
        eventLinkedBlockingQueue.offer(new ConnectionEvent(ConnectionEvent.CONNECTED));
    } else {
        switchServerAsync();
    }
    
    registerServerRequestHandler(new ConnectResetRequestHandler());
    
    //register client detection request.
    registerServerRequestHandler(new ServerRequestHandler() {
        @Override
        public Response requestReply(Request request) {
            if (request instanceof ClientDetectionRequest) {
                return new ClientDetectionResponse();
            }
            
            return null;
        }
    });
    
}

在初始化启动的过程中我们可以注意到在服务器健康检查的线程中使用了healthCheck的函数,而在healthCheck的函数中又利用当前的Connection对象来向服务器发送探测请求,Connection对象是一个抽象类,实现了Requester接口,但是并未给出具体的实现,所有功能都在具体的实现类GrpcConnection中完成。而在RpcClient中另一个函数request中,我们也可以看到类似的设计,定义这个request函数的初衷是为了给发送远程请求加入容错重试机制,而Connection的实现类则只需要关心具体的通信即可。

public Response request(Request request, long timeoutMills) throws NacosException {
    int retryTimes = 0;
    Response response = null;
    Exception exceptionThrow = null;
    long start = System.currentTimeMillis();
    while (retryTimes < RETRY_TIMES && System.currentTimeMillis() < timeoutMills + start) {
        boolean waitReconnect = false;
        try {
            if (this.currentConnection == null || !isRunning()) {
                waitReconnect = true;
                throw new NacosException(NacosException.CLIENT_DISCONNECT,
                        "Client not connected,current status:" + rpcClientStatus.get());
            }
            response = this.currentConnection.request(request, timeoutMills);
            if (response == null) {
                throw new NacosException(SERVER_ERROR, "Unknown Exception.");
            }
            if (response instanceof ErrorResponse) {
                if (response.getErrorCode() == NacosException.UN_REGISTER) {
                    synchronized (this) {
                        waitReconnect = true;
                        if (rpcClientStatus.compareAndSet(RpcClientStatus.RUNNING, RpcClientStatus.UNHEALTHY)) {
                            LoggerUtils.printIfErrorEnabled(LOGGER,
                                    "Connection is unregistered, switch server,connectionId={},request={}",
                                    currentConnection.getConnectionId(), request.getClass().getSimpleName());
                            switchServerAsync();
                        }
                    }
                    
                }
                throw new NacosException(response.getErrorCode(), response.getMessage());
            }
            // return response.
            lastActiveTimeStamp = System.currentTimeMillis();
            return response;
            
        } catch (Exception e) {
            if (waitReconnect) {
                try {
                    //wait client to re connect.
                    Thread.sleep(Math.min(100, timeoutMills / 3));
                } catch (Exception exception) {
                    //Do nothing.
                }
            }
            
            LoggerUtils.printIfErrorEnabled(LOGGER, "Send request fail, request={}, retryTimes={},errorMessage={}",
                    request, retryTimes, e.getMessage());
            
            exceptionThrow = e;
            
        }
        retryTimes++;
        
    }
    
    if (rpcClientStatus.compareAndSet(RpcClientStatus.RUNNING, RpcClientStatus.UNHEALTHY)) {
        switchServerAsyncOnRequestFail();
    }
    
    if (exceptionThrow != null) {
        throw (exceptionThrow instanceof NacosException) ? (NacosException) exceptionThrow
                : new NacosException(SERVER_ERROR, exceptionThrow);
    } else {
        throw new NacosException(SERVER_ERROR, "Request fail,Unknown Error");
    }
}

最后我们来看一下RpcClient中关于请求处理的部分,在初始化的时候我们提到RpcClient会注册两个请求处理器,所有的请求处理器被存放在一个List当中,处理请求的时候并非是像服务端侧一样获取到Request的类型找到对应的请求处理器然后处理消息,而是通过遍历请求处理器的方式,当其中某个处理器返回的响应不为空时则停止遍历过程。

protected Response handleServerRequest(final Request request) {
    
    LoggerUtils.printIfInfoEnabled(LOGGER, "[{}]receive server push request,request={},requestId={}", name,
            request.getClass().getSimpleName(), request.getRequestId());
    lastActiveTimeStamp = System.currentTimeMillis();
    for (ServerRequestHandler serverRequestHandler : serverRequestHandlers) {
        try {
            Response response = serverRequestHandler.requestReply(request);
            
            if (response != null) {
                LoggerUtils.printIfInfoEnabled(LOGGER, "[{}]ack server push request,request={},requestId={}", name,
                        request.getClass().getSimpleName(), request.getRequestId());
                return response;
            }
        } catch (Exception e) {
            LoggerUtils.printIfInfoEnabled(LOGGER, "[{}]handleServerRequest:{}, errorMessage={}", name,
                    serverRequestHandler.getClass().getName(), e.getMessage());
        }
        
    }
    return null;
}

GrpcClient/GrpcConnection

在RpcClient定义了基本客户端与远端服务器通信的功能抽象,而具体的通信实现则由实体类来负责。

GrpcClient负责与远程服务器建立连接,并初始化Grpc一元请求远程调用的Stub以及双向流的StreamObserver,创建一个GrpcConnection的对象,并将初始化的Channel注入到GrpcConnection中。随后发起一个连接建立的请求,在服务端侧注册自己的连接。值得一提的是,双向流的BiRequestStreamStub是建立在RequestFutureStub的Channel之上的,利用了HTTP2多路复用的特性,所以Client与Server之间只会有一个连接建立。

@Override
public Connection connectToServer(ServerInfo serverInfo) {
    try {
        if (grpcExecutor == null) {
            int threadNumber = ThreadUtils.getSuitableThreadCount(8);
            grpcExecutor = new ThreadPoolExecutor(threadNumber, threadNumber, 10L, TimeUnit.SECONDS,
                    new LinkedBlockingQueue<>(10000),
                    new ThreadFactoryBuilder().setDaemon(true).setNameFormat("nacos-grpc-client-executor-%d")
                            .build());
            grpcExecutor.allowCoreThreadTimeOut(true);
            
        }
        int port = serverInfo.getServerPort() + rpcPortOffset();
        RequestGrpc.RequestFutureStub newChannelStubTemp = createNewChannelStub(serverInfo.getServerIp(), port);
        if (newChannelStubTemp != null) {
            
            Response response = serverCheck(serverInfo.getServerIp(), port, newChannelStubTemp);
            if (response == null || !(response instanceof ServerCheckResponse)) {
                shuntDownChannel((ManagedChannel) newChannelStubTemp.getChannel());
                return null;
            }
            
            BiRequestStreamGrpc.BiRequestStreamStub biRequestStreamStub = BiRequestStreamGrpc
                    .newStub(newChannelStubTemp.getChannel());
            GrpcConnection grpcConn = new GrpcConnection(serverInfo, grpcExecutor);
            grpcConn.setConnectionId(((ServerCheckResponse) response).getConnectionId());
            
            //create stream request and bind connection event to this connection.
            StreamObserver<Payload> payloadStreamObserver = bindRequestStream(biRequestStreamStub, grpcConn);
            
            // stream observer to send response to server
            grpcConn.setPayloadStreamObserver(payloadStreamObserver);
            grpcConn.setGrpcFutureServiceStub(newChannelStubTemp);
            grpcConn.setChannel((ManagedChannel) newChannelStubTemp.getChannel());
            //send a  setup request.
            ConnectionSetupRequest conSetupRequest = new ConnectionSetupRequest();
            conSetupRequest.setClientVersion(VersionUtils.getFullClientVersion());
            conSetupRequest.setLabels(super.getLabels());
            conSetupRequest.setAbilities(super.clientAbilities);
            conSetupRequest.setTenant(super.getTenant());
            grpcConn.sendRequest(conSetupRequest);
            //wait to register connection setup
            Thread.sleep(100L);
            return grpcConn;
        }
        return null;
    } catch (Exception e) {
        LOGGER.error("[{}]Fail to connect to server!,error={}", GrpcClient.this.getName(), e);
    }
    return null;
}

在GrpcConnection中,主要就是利用Grpc的Stub来完成一元请求以及StreamObserver来完成双向流的交互,都是一些比较基础的Grpc代码。简单来看下GrpcConnection对于同步请求异步封装的过程,在调用时主要利用了Futures来将用户自定义的回调函数进行封装,并且提供了在线程池中执行回调函数以及设置超时的功能。

@Override
public void asyncRequest(Request request, final RequestCallBack requestCallBack) throws NacosException {
    Payload grpcRequest = GrpcUtils.convert(request);
    ListenableFuture<Payload> requestFuture = grpcFutureServiceStub.request(grpcRequest);
    
    //set callback .
    Futures.addCallback(requestFuture, new FutureCallback<Payload>() {
        @Override
        public void onSuccess(@Nullable Payload grpcResponse) {
            Response response = (Response) GrpcUtils.parse(grpcResponse);
            
            if (response != null) {
                if (response instanceof ErrorResponse) {
                    requestCallBack.onException(new NacosException(response.getErrorCode(), response.getMessage()));
                } else {
                    requestCallBack.onResponse(response);
                }
            } else {
                requestCallBack.onException(new NacosException(ResponseCode.FAIL.getCode(), "response is null"));
            }
        }
        
        @Override
        public void onFailure(Throwable throwable) {
            if (throwable instanceof CancellationException) {
                requestCallBack.onException(
                        new TimeoutException("Timeout after " + requestCallBack.getTimeout() + " milliseconds."));
            } else {
                requestCallBack.onException(throwable);
            }
        }
    }, requestCallBack.getExecutor() != null ? requestCallBack.getExecutor() : this.executor);
    // set timeout future.
    ListenableFuture<Payload> payloadListenableFuture = Futures
            .withTimeout(requestFuture, requestCallBack.getTimeout(), TimeUnit.MILLISECONDS,
                    RpcScheduledExecutor.TIMEOUT_SCHEDULER);
    
}