前言

最近从sofarpc摸到sofabolt，它是整个sofa体系的基石，比如说sofarpc里面bolt协议，还有sofaregistry里面pingpong、注册机制也是基于sofabolt通讯框架来实现，可想而知这个框架是基础框架中的基础，哈哈。

下面我们通过源码来认识下sofabolt的实现思路~

概括

sofa这套框架自定义了对应protocol，底层通过netty交互，rpc框架则不同，不能要求大家统一走你的bolt协议对吧，需要支持http协议、tcp协议这些。

sofabolt 实现了以上功能点，底层接入netty框架，对应中间件pingpong心跳机制也得有，然后通讯里面有同步、异步，请求自然会有超时的情况需要做兜底处理，比如里面有failfast、failover，我们回忆一下熔断，一般会写一个rollback方法，而不是直接返回一串系统默认的超时信息。

请求过程

BaseRemoting

com.alipay.remoting.BaseRemoting#invokeSync

protected RemotingCommand invokeSync(final Connection conn, final RemotingCommand request,
                                     final int timeoutMillis) throws RemotingException,
                                                             InterruptedException {
    int remainingTime = remainingTime(request, timeoutMillis);
    if (remainingTime <= ABANDONING_REQUEST_THRESHOLD) {
        // already timeout
        LOGGER
            .warn(
                "already timeout before writing to the network, requestId: {}, remoting address: {}",
                request.getId(),
                conn.getUrl() != null ? conn.getUrl() : RemotingUtil.parseRemoteAddress(conn
                    .getChannel()));
        return this.commandFactory.createTimeoutResponse(conn.getRemoteAddress());
    }

    final InvokeFuture future = createInvokeFuture(request, request.getInvokeContext());
    conn.addInvokeFuture(future);
    final int requestId = request.getId();
    InvokeContext invokeContext = request.getInvokeContext();
    if (null != invokeContext) {
        invokeContext.put(InvokeContext.BOLT_PROCESS_CLIENT_BEFORE_SEND, System.nanoTime());
    }
    try {
        conn.getChannel().writeAndFlush(request).addListener(new ChannelFutureListener() {

            @Override
            public void operationComplete(ChannelFuture f) throws Exception {
                if (!f.isSuccess()) {
                    conn.removeInvokeFuture(requestId);
                    future.putResponse(commandFactory.createSendFailedResponse(
                        conn.getRemoteAddress(), f.cause()));
                    LOGGER.error("Invoke send failed, id={}", requestId, f.cause());
                }
            }

        });
        if (null != invokeContext) {
            invokeContext.put(InvokeContext.BOLT_PROCESS_CLIENT_AFTER_SEND, System.nanoTime());
        }
    } catch (Exception e) {
        conn.removeInvokeFuture(requestId);
        future.putResponse(commandFactory.createSendFailedResponse(conn.getRemoteAddress(), e));
        LOGGER.error("Exception caught when sending invocation, id={}", requestId, e);
    }
    RemotingCommand response = future.waitResponse(remainingTime);

    if (null != invokeContext) {
        invokeContext.put(InvokeContext.BOLT_PROCESS_CLIENT_RECEIVED, System.nanoTime());
    }

    if (response == null) {
        conn.removeInvokeFuture(requestId);
        response = this.commandFactory.createTimeoutResponse(conn.getRemoteAddress());
        LOGGER.warn("Wait response, request id={} timeout!", requestId);
    }

    return response;
}

超时处理，如果当前时间已经超过超时时间，那么就处理response，比如说debug模式很容易就超时。

InvokeFuture

这个是client客户端保存future，通过里面response来获取请求结果。那么多的future怎么知道我回传的时候是之前那个请求的呢？

答案是requestId，就是每次请求的时候会生成一个请求id，然后netty回传信息的时候，通过requestd回查我之前是那个invokeFuture请求的，然后将数据塞回去。

conn.getChannel().writeAndFlush(request).addListener(new ChannelFutureListener() {

    @Override
    public void operationComplete(ChannelFuture f) throws Exception {
        if (!f.isSuccess()) {
            conn.removeInvokeFuture(requestId);
            future.putResponse(commandFactory.createSendFailedResponse(
                conn.getRemoteAddress(), f.cause()));
            LOGGER.error("Invoke send failed, id={}", requestId, f.cause());
        }
    }

});

这段就是netty代码了，拿到channel往里面怼数据，如果发送失败则直接塞回去response。

RemotingCommand response = future.waitResponse(remainingTime);

上面这段是等待请求响应,具体方法实现在下面

@Override
public ResponseCommand waitResponse(long timeoutMillis) throws InterruptedException {
    this.countDownLatch.await(timeoutMillis, TimeUnit.MILLISECONDS);
    return this.responseCommand;
}

DefaultInvokeFuture

这一块的客户端请求的过程，里面也会有一些有意思的点，会是会通过CountDownLatch来阻塞请求链接，当请求还没响应的时候，阻塞请求线程，当超时或者响应回复之后，CountDownLatch countDown方法将线程通下。

这个比较有意思的地方，我们可以学习一下，就是通过CountDownLatch来控制阻塞，然后通过超时方法，如果response字段为空，塞个空的值回去。

请求回传过程

RpcHandler

com.alipay.remoting.rpc.RpcHandler#channelRead

@ChannelHandler.Sharable
public class RpcHandler extends ChannelInboundHandlerAdapter {
    private boolean                                     serverSide;

    private ConcurrentHashMap<String, UserProcessor<?>> userProcessors;

    public RpcHandler(ConcurrentHashMap<String, UserProcessor<?>> userProcessors) {
        serverSide = false;
        this.userProcessors = userProcessors;
    }

    public RpcHandler(boolean serverSide, ConcurrentHashMap<String, UserProcessor<?>> userProcessors) {
        this.serverSide = serverSide;
        this.userProcessors = userProcessors;
    }

    @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        ProtocolCode protocolCode = ctx.channel().attr(Connection.PROTOCOL).get();
        Protocol protocol = ProtocolManager.getProtocol(protocolCode);
        protocol.getCommandHandler().handleCommand(
            new RemotingContext(ctx, new InvokeContext(), serverSide, userProcessors), msg);
        ctx.fireChannelRead(msg);
    }
}

处理请求，我们再看.alipay.remoting.rpc.protocol.RpcCommandHandler#handle 这方法，会判断msg的类型，如果是list的话是批量处理，如果是单个则单独处理。

com.alipay.remoting.rpc.protocol.RpcRequestProcessor#process

@Override
public void process(RemotingContext ctx, RpcRequestCommand cmd, ExecutorService defaultExecutor)
                                                                                                throws Exception {
    if (!deserializeRequestCommand(ctx, cmd, RpcDeserializeLevel.DESERIALIZE_CLAZZ)) {
        return;
    }
    UserProcessor userProcessor = ctx.getUserProcessor(cmd.getRequestClass());
    if (userProcessor == null) {
        String errMsg = "No user processor found for request: " + cmd.getRequestClass();
        logger.error(errMsg);
        sendResponseIfNecessary(ctx, cmd.getType(), this.getCommandFactory()
            .createExceptionResponse(cmd.getId(), errMsg));
        return;// must end process
    }

    // set timeout check state from user's processor
    ctx.setTimeoutDiscard(userProcessor.timeoutDiscard());

    // to check whether to process in io thread
    if (userProcessor.processInIOThread()) {
        if (!deserializeRequestCommand(ctx, cmd, RpcDeserializeLevel.DESERIALIZE_ALL)) {
            return;
        }
        // process in io thread
        new ProcessTask(ctx, cmd).run();
        return;// end
    }

    Executor executor;
    // to check whether get executor using executor selector
    if (null == userProcessor.getExecutorSelector()) {
        executor = userProcessor.getExecutor();
    } else {
        // in case haven't deserialized in io thread
        // it need to deserialize clazz and header before using executor dispath strategy
        if (!deserializeRequestCommand(ctx, cmd, RpcDeserializeLevel.DESERIALIZE_HEADER)) {
            return;
        }
        //try get executor with strategy
        executor = userProcessor.getExecutorSelector().select(cmd.getRequestClass(),
            cmd.getRequestHeader());
    }

    // Till now, if executor still null, then try default
    if (executor == null) {
        executor = (this.getExecutor() == null ? defaultExecutor : this.getExecutor());
    }

    cmd.setBeforeEnterQueueTime(System.nanoTime());
    // use the final executor dispatch process task
    executor.execute(new ProcessTask(ctx, cmd));
}

到这里我们就明白了，通讯回传的时候是在netty接收的，然后是批量则拆开处理，然后通过ProcessTask线程来处理响应的内容，这个类里面主要逻辑是如果是超时的话，set超时结果到future里面。

性能问题

对于开发者来讲，框架都是简单易用，很少会关心内部实现情况，所以一旦框架出现问题，那么排查起来会很麻烦。我认为框架设计的时候，需要对线程池有监控措施

我们从上面源码介绍可以知道，com.alipay.remoting.rpc.protocol.RpcRequestProcessor#process，这个方法里面会将netty响应结果，通过异步塞回到responseFuture里面，这样我们很容易联想到性能问题。

里面有个代码，如果当前io是被占用的话，通过新建线程来处理，一般来说是交给线程池来执行，就会出现棘手的问题，比如说在大并发的请求下，线程池会出现请求溢出的情况，默认的拒绝策略是拒绝，这样不管是rpc框架还是注册中心框架都是不友好的。即使netty远程正常响应，但是客户端线程池已经爆了，那很多请求没有响应结果，这是不能被容忍的。

所以我想法是对线程池有对应的监控，或者说有手册可以清晰将这个问题说明，比如说线程池最多承接5k请求，当超过这个程度动态自定义线程池，或者通过增加节点来避免这种情况。

sofabolt怎么实现性能管理

sofabolt-sofa体系的基石

前言

概括

请求过程

请求回传过程

性能问题