ROCKETMQ源码分析(broker主从同步)

113 阅读3分钟

概述

我们都知道rocketmq的broker是基于raft协议来做的集群高可用,那么我们今天这节就来讲讲rocketmq的主从同步相关的源码分析。

总流程

image.png

HAservice

private final AtomicInteger connectionCount = new AtomicInteger(0);

private final List<HAConnection> connectionList = new LinkedList<>();

private final AcceptSocketService acceptSocketService; //主要实现主服务器监听从服务器的连接请求

private final DefaultMessageStore defaultMessageStore;//消息存储

private final WaitNotifyObject waitNotifyObject = new WaitNotifyObject();
private final AtomicLong push2SlaveMaxOffset = new AtomicLong(0); //记录发送给从节点的最大位移

private final GroupTransferService groupTransferService; //和同步落盘的commitlOG的作用是差不多的 用于阻塞客户端的请求等待从节点ack

private final HAClient haClient;  //用于从节点的相关操作

我们接下来就根据上面的流程来看一看源码~

master step1:启动并监听haPort【默认10912】

其中这个master启动并监听端口等待从节点的连接操作是在acceptScoketService中进行实现的:

    
    private final SocketAddress socketAddressListen; //监听的端口
    private ServerSocketChannel serverSocketChannel; //channel
    private Selector selector;

    public AcceptSocketService(final int port) {
        this.socketAddressListen = new InetSocketAddress(port);
    }

    /**
     * Starts listening to slave connections.
     *
     * @throws Exception If fails.
     */
    public void beginAccept() throws Exception {
     //启动serverSocketChannel并注册选择器
        this.serverSocketChannel = ServerSocketChannel.open();
        this.selector = RemotingUtil.openSelector();
        this.serverSocketChannel.socket().setReuseAddress(true);
        this.serverSocketChannel.socket().bind(this.socketAddressListen);
        this.serverSocketChannel.configureBlocking(false);
        this.serverSocketChannel.register(this.selector, SelectionKey.OP_ACCEPT);
    }


    /**
     * {@inheritDoc}
     */
    @Override
    public void run() {
        log.info(this.getServiceName() + " service started");

        while (!this.isStopped()) {
            try {
                this.selector.select(1000);  //每隔1s 处理一下连接请求
                Set<SelectionKey> selected = this.selector.selectedKeys();

                if (selected != null) {
                    for (SelectionKey k : selected) {
                        if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) {
                            SocketChannel sc = ((ServerSocketChannel) k.channel()).accept();

                            if (sc != null) {
                                HAService.log.info("HAService receive new connection, "
                                    + sc.socket().getRemoteSocketAddress());

                                try {   //将从节点的连接封装成HAConnection对象加入集合中
                                    HAConnection conn = new HAConnection(HAService.this, sc);
                                    conn.start();
                                    HAService.this.addConnection(conn);
                                } catch (Exception e) {
                                    log.error("new HAConnection exception", e);
                                    sc.close();
                                }
                            }
                        } else {
                            log.warn("Unexpected ops in select " + k.readyOps());
                        }
                    }

                    selected.clear();
                }
            } catch (Exception e) {
                log.error(this.getServiceName() + " service has exception.", e);
            }
        }

        log.info(this.getServiceName() + " service end");
    }


}

slave step1 启动并连接master

从节点的相关操作是在HaClient中进行实现的,我们先来看看HaClient的属性有哪些

private static final int READ_MAX_BUFFER_SIZE = 1024 * 1024 * 4;  //读缓冲区最大大小
private final AtomicReference<String> masterAddress = new AtomicReference<>();  //主服务器地址
private final ByteBuffer reportOffset = ByteBuffer.allocate(8); //从服务器向主服务器发起主从同步的拉取偏移量
private SocketChannel socketChannel; //网络传输通道
private Selector selector; //选择器
private long lastWriteTimestamp = System.currentTimeMillis(); //上次写入消息的时间戳

private long currentReportedOffset = 0;  //反馈从服务器的复制进度 即当前CommitLog文件的最大偏移量
private int dispatchPosition = 0; //本次已处理的读缓冲器的指针
private ByteBuffer byteBufferRead = ByteBuffer.allocate(READ_MAX_BUFFER_SIZE); //读缓冲区 大小为4M
private ByteBuffer byteBufferBackup = ByteBuffer.allocate(READ_MAX_BUFFER_SIZE); //读缓冲区备份 与bufferRead交换

连接主服务器

private boolean connectMaster() throws ClosedChannelException {
    if (null == socketChannel) {
        String addr = this.masterAddress.get();
        if (addr != null) {

            SocketAddress socketAddress = RemotingUtil.string2SocketAddress(addr);
            if (socketAddress != null) {
                this.socketChannel = RemotingUtil.connect(socketAddress); //连接主服务器
                if (this.socketChannel != null) {
                    this.socketChannel.register(this.selector, SelectionKey.OP_READ);
                }
            }
        }

        this.currentReportedOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();

        this.lastWriteTimestamp = System.currentTimeMillis();
    }

    return this.socketChannel != null;
}

slave step2 发送当前自己的commitLog的Offset最大值

说明:

这里其实可以从两个方面来看 对于slave来说 这次发出去的是当前自己的commitLog的最大值 也就是下次请求的开始位置 对于master来说可以看作是slave上次请求的ack/下次请求的开始offset

private boolean reportSlaveMaxOffset(final long maxOffset) { //commitLog的最大offset
    this.reportOffset.position(0);    
    this.reportOffset.limit(8); 
    this.reportOffset.putLong(maxOffset); 
    this.reportOffset.position(0);
    this.reportOffset.limit(8);

    for (int i = 0; i < 3 && this.reportOffset.hasRemaining(); i++) {
        try {
            this.socketChannel.write(this.reportOffset);  //写入管道
        } catch (IOException e) {
            log.error(this.getServiceName()
                + "reportSlaveMaxOffset this.socketChannel.write exception", e);
            return false;
        }
    }

    lastWriteTimestamp = HAService.this.defaultMessageStore.getSystemClock().now();
    return !this.reportOffset.hasRemaining();
}

master step2处理从服务器拉取信息的请求

@Override
public void run() {
    HAConnection.log.info(this.getServiceName() + " service started");

    while (!this.isStopped()) {
        try {
            this.selector.select(1000);

            if (-1 == HAConnection.this.slaveRequestOffset) { //表示当前还没有从节点请求
                Thread.sleep(10);
                continue;
            }

            if (-1 == this.nextTransferFromWhere) {  //证明这个时候主服务器还没有收到从服务器的拉取消息的请求 放弃本次事件处理  这个字段在收到从服务器拉取消息的请求时候更新
                if (0 == HAConnection.this.slaveRequestOffset) {  //从当前的commitLog的最大偏移量开始
long masterOffset = HAConnection.this.haService.getDefaultMessageStore().getCommitLog().getMaxOffset();
                    masterOffset =
                        masterOffset
                            - (masterOffset % HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
                            .getMappedFileSizeCommitLog());

                    if (masterOffset < 0) {
                        masterOffset = 0;
                    }

                    this.nextTransferFromWhere = masterOffset;
                } else {
                    this.nextTransferFromWhere = HAConnection.this.slaveRequestOffset;
                }

                log.info("master transfer data from " + this.nextTransferFromWhere + " to slave[" + HAConnection.this.clientAddr
                    + "], and slave request " + HAConnection.this.slaveRequestOffset);
            }

            if (this.lastWriteOver) {  //判断上次的消息时候处理完了

                long interval =
                    HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastWriteTimestamp;

                if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
                    .getHaSendHeartbeatInterval()) {
                    // Build Header
                    this.byteBufferHeader.position(0);
                    this.byteBufferHeader.limit(headerSize);
                    this.byteBufferHeader.putLong(this.nextTransferFromWhere);
                    this.byteBufferHeader.putInt(0);
                    this.byteBufferHeader.flip();

                    this.lastWriteOver = this.transferData();
                    if (!this.lastWriteOver)
                        continue;
                }
            } else {
                this.lastWriteOver = this.transferData(); //没写完先处理上次的
                if (!this.lastWriteOver)
                    continue;
            }
            //通过当前的offset取commitLog
            SelectMappedBufferResult selectResult =
                HAConnection.this.haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere);
            if (selectResult != null) {
                int size = selectResult.getSize();
                //最大size 【可配置】
                if (size > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize()) {
                    size = HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize();
                }

                long thisOffset = this.nextTransferFromWhere;
                this.nextTransferFromWhere += size;  //设置下次偏移量

                selectResult.getByteBuffer().limit(size);
                this.selectMappedBufferResult = selectResult;

                // Build Header 构建消息header
                this.byteBufferHeader.position(0);
                this.byteBufferHeader.limit(headerSize);
                this.byteBufferHeader.putLong(thisOffset);
                this.byteBufferHeader.putInt(size);
                this.byteBufferHeader.flip();

                this.lastWriteOver = this.transferData();
            } else {

                HAConnection.this.haService.getWaitNotifyObject().allWaitForRunning(100);
            }
        } catch (Exception e) {

            HAConnection.log.error(this.getServiceName() + " service has exception.", e);
            break;
        }
    }

    HAConnection.this.haService.getWaitNotifyObject().removeFromWaitingThreadTable();

    if (this.selectMappedBufferResult != null) {
        this.selectMappedBufferResult.release();
    }

    this.makeStop();

    readSocketService.makeStop();

    haService.removeConnection(HAConnection.this);

    SelectionKey sk = this.socketChannel.keyFor(this.selector);
    if (sk != null) {
        sk.cancel();
    }

    try {
        this.selector.close();
        this.socketChannel.close();
    } catch (IOException e) {
        HAConnection.log.error("", e);
    }

    HAConnection.log.info(this.getServiceName() + " service end");
}

transferData()

private boolean transferData() throws Exception {
    int writeSizeZeroTimes = 0;
    // Write Header
    while (this.byteBufferHeader.hasRemaining()) {
        int writeSize = this.socketChannel.write(this.byteBufferHeader);
        if (writeSize > 0) {
            writeSizeZeroTimes = 0;
            this.lastWriteTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
        } else if (writeSize == 0) {
            if (++writeSizeZeroTimes >= 3) {
                break;
            }
        } else {
            throw new Exception("ha master write header error < 0");
        }
    }

    if (null == this.selectMappedBufferResult) {
        return !this.byteBufferHeader.hasRemaining();
    }

    writeSizeZeroTimes = 0;

    // Write Body  开始写消息
    if (!this.byteBufferHeader.hasRemaining()) {
        while (this.selectMappedBufferResult.getByteBuffer().hasRemaining()) {
            int writeSize = this.socketChannel.write(this.selectMappedBufferResult.getByteBuffer());
            if (writeSize > 0) {
                writeSizeZeroTimes = 0;
                this.lastWriteTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
            } else if (writeSize == 0) {
                if (++writeSizeZeroTimes >= 3) {
                    break;
                }
            } else {
                throw new Exception("ha master write body error < 0");
            }
        }
    }

    boolean result = !this.byteBufferHeader.hasRemaining() && !this.selectMappedBufferResult.getByteBuffer().hasRemaining();

    if (!this.selectMappedBufferResult.getByteBuffer().hasRemaining()) {
        this.selectMappedBufferResult.release();
        this.selectMappedBufferResult = null;
    }

    return result;
}

流程:

  1. master每隔1s处理一次slave的读请求
  2. 判断记录的下次传输的起始offset如果是-1并且slave请求的offset=0的话就从当前commitLog的最大offset开始
  3. 判断上次的消息是否处理完了 如果没有先将上次的处理完
  4. 将header【包括起始偏移量&size】写到channel中 并设置下次的起始offset
  5. 根据偏移量获取commotLog内容 并将其写到channel中
  6. 标记当前是否传输完成

slave step2 从节点处理master信息并更新commitlog

处理master的信息

private boolean processReadEvent() {
    int readSizeZeroTimes = 0;
    while (this.byteBufferRead.hasRemaining()) {
        try {
            int readSize = this.socketChannel.read(this.byteBufferRead);
            if (readSize > 0) {
                readSizeZeroTimes = 0;
                boolean result = this.dispatchReadRequest();
                if (!result) {
                    log.error("HAClient, dispatchReadRequest error");
                    return false;
                }
            } else if (readSize == 0) {
                if (++readSizeZeroTimes >= 3) {
                    break;
                }
            } else {
                log.info("HAClient, processReadEvent read socket < 0");
                return false;
            }
        } catch (IOException e) {
            log.info("HAClient, processReadEvent read socket exception", e);
            return false;
        }
    }

    return true;
}

主要看一下dispatchReadRequest这个方法的逻辑:

private boolean dispatchReadRequest() {
    final int msgHeaderSize = 8 + 4; // phyoffset + size
    int readSocketPos = this.byteBufferRead.position();

    while (true) {
        int diff = this.byteBufferRead.position() - this.dispatchPosition;
        if (diff >= msgHeaderSize) { //证明有消息
            long masterPhyOffset = this.byteBufferRead.getLong(this.dispatchPosition);
            int bodySize = this.byteBufferRead.getInt(this.dispatchPosition + 8);

            long slavePhyOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();

            if (slavePhyOffset != 0) {
                if (slavePhyOffset != masterPhyOffset) {
                    log.error("master pushed offset not equal the max phy offset in slave, SLAVE: "
                        + slavePhyOffset + " MASTER: " + masterPhyOffset);
                    return false;
                }
            }

            if (diff >= (msgHeaderSize + bodySize)) {
                byte[] bodyData = new byte[bodySize];
                this.byteBufferRead.position(this.dispatchPosition + msgHeaderSize);
                this.byteBufferRead.get(bodyData);

                HAService.this.defaultMessageStore.appendToCommitLog(masterPhyOffset, bodyData);

                this.byteBufferRead.position(readSocketPos);
                this.dispatchPosition += msgHeaderSize + bodySize;

                if (!reportSlaveMaxOffsetPlus()) {
                    return false;
                }

                continue;
            }
        }

        if (!this.byteBufferRead.hasRemaining()) {
            this.reallocateByteBuffer();
        }

        break;
    }

    return true;
}

步骤:

1.每隔5s(默认)发送一次心跳,将当前自己的commitLog的最大offset发送给master

2.每隔1s发送处理一次master发送来的信息

3.超过12字节开始处理【offset+size】证明有消息

4.判断一下当前传过来的消息的offset和自己本地的最大offset是否相等 不等直接退出

5.将消息取出追加到commitLog

读写分离

概述

client发起pullMessage请求,brker处理请求并返回result其中包括下次建议的当前messageQueue的brokerId,并更新到本地的map中下次发起对这个messageQueue的拉取消息的请求的时候使用这个brokerId

整体架构

image.png

brokermaster进行判断并选择slave

image.png

上面的代码可以看出如果当前请求拉取的消息的起始位移和当前master的maxOffset相差超过了config中配置的阈值*内存的大小那么证明当前请求的消息部分已经被交换出了内存,那么这个时候master会在返回的结果中告诉client下次建议的请求的broker地址

image.png

client接收到result中的建议的broker地址并更新到本地内存中

image.png

public void updatePullFromWhichNode(final MessageQueue mq, final long brokerId) {
//pullFromWhichNodeTable:
//ConcurrentMap<MessageQueue, AtomicLong/* brokerId */>
    AtomicLong suggest = this.pullFromWhichNodeTable.get(mq);
    if (null == suggest) {
        this.pullFromWhichNodeTable.put(mq, new AtomicLong(brokerId));
    } else {
        suggest.set(brokerId);
    }
}

总结: 其实我们不难看出,这种被动的读写分离其实是出现在消费者消费缓慢的时候也就是消息堆积的时候主服务器负载过高,消息开始和磁盘进行置换 这个时候client再去master拉取消息的时候master就会给他一个建议下次应该去哪个broker拉取,大多数的情况下其实slave都是当作一个备库的形式出现的下节我们来看看主从切换