一.RocketMq消息存储架构一览
我们可以看到上面的rocketmq存储架构的宏观图,我们可以一目了然的知道,commitLog采用的是顺序写磁盘,但是这种顺序写对于读来说是不太友好的,这个时候我们可以看到他并没有让客户端直接消费commitLog,而是让客户端消费comsumeQueue,我们可以来看一看consumeQueue的结构:mesOffset(8字节)msgLength(4字节)tagHash(8字节)
看到这里我们可能还是不太了解,这种结构有什么意义吗我们只是知道了消息的offset,还是不知道他在哪个文件具体在那个位置,这个时候我们的commitLog的命名方式就可以起到作用了!我们上面说的commitLog的存储格式是当前文件最小的消息的偏移量作为文件名称,例如第一个commitLog就是0000000000000000000,之所以这样命名就是为了consumeQueue在被消费的时候需要通过offset从commitLog中定位到消息然后往后取length长度取到这条消息,commitLog的这种命名方式可以让他轻松的通过二分查找法定位到这条消息所在的文件然后将消息取出,时间复杂度O(logn)。
接下来我们就结合源码看一下rocktemq的存储架构 主要的文件存储逻辑就在DefaultMessageStore:
几个比较关键的属性:
// CommitLog 文件存储的实现类
private final CommitLog commitLog;
//consumequeue文件刷盘线程
private final FlushConsumeQueueService flushConsumeQueueService;
//清除commitlog文件服务
private final CleanCommitLogService cleanCommitLogService;
//清除consumequeue文件服务
private final CleanConsumeQueueService cleanConsumeQueueService;
//index文件实现类
private final IndexService indexService;
//MappedFile文件分配服务
private final AllocateMappedFileService allocateMappedFileService;
//CommitLog消息分发 根据CommitLog文件构建ConsumeQueue文件,Index文件
private final ReputMessageService reputMessageService;
//存储高可用机制
private final HAService haService;
//定时任务
private final ScheduleMessageService scheduleMessageService;
//消息堆内存缓存
private final TransientStorePool transientStorePool;
//文件刷盘监测点
private StoreCheckpoint storeCheckpoint;
//CommitLog文件转发请求
private final LinkedList<CommitLogDispatcher> dispatcherList;
我们接下来的介绍主要从存,取 以及向consumeQueue&Index文件分发三个方面进行分析:
一.存消息的过程:
① DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
public PutMessageResult putMessage(MessageExtBrokerInner msg) {
//....一系列的判断过程
//是否停止
//当前broker的角色是否是slave
//是否可以写入
//主题是否过长
//判断properties是否超过了最大值
//判断操作系统是否页繁忙
//调用commitLog的putMessage
PutMessageResult result = this.commitLog.putMessage(msg);
//......
}
② commitLog.putMessage(msg)
在介绍这个方法之前我们先了解一下CommitLog这个类的几个比较核心的属性:
//存储文件队列
protected final MappedFileQueue mappedFileQueue;
protected final DefaultMessageStore defaultMessageStore;
//commitLog持久化service
private final FlushCommitLogService flushCommitLogService;
//If TransientStorePool enabled, we must flush message to FileChannel at fixed periods
private final FlushCommitLogService commitLogService;
private final AppendMessageCallback appendMessageCallback;
private final ThreadLocal<MessageExtBatchEncoder> batchEncoderThreadLocal;
protected HashMap<String/* topic-queueid */, Long/* offset */> topicQueueTable = new HashMap<String, Long>(1024);
protected volatile long confirmOffset = -1L;
private volatile long beginTimeInLock = 0;
protected final PutMessageLock putMessageLock;
mappedFileQueue:
这个类主要是和操作系统进行交互的进行文件的存取以及对应的offset的存取(主要进行文件存取工作的是mappedFile 这个类主要是进行这些mappedFile的管理工作)
接下来我们就来看看这个putMessage到底干了什么事情?
public PutMessageResult putMessage(final MessageExtBrokerInner msg) {
//.....
String topic = msg.getTopic(); //获取主题
int queueId = msg.getQueueId(); //获取对应的queue
final int tranType = MessageSysFlag.getTransactionValue(msg.getSysFlag()); //是否是事务消息
if (tranType == MessageSysFlag.TRANSACTION_NOT_TYPE
|| tranType == MessageSysFlag.TRANSACTION_COMMIT_TYPE) {
// Delay Delivery
if (msg.getDelayTimeLevel() > 0) {
if (msg.getDelayTimeLevel() > this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel()) {
//判断当前消息的延迟级别如果比最高延迟级别大就用最高的
msg.setDelayTimeLevel(this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel());
}
//如果是延迟消息的话将消息放到对应的queue中(延迟消息对应主题下共有level数量的queue)
topic = ScheduleMessageService.SCHEDULE_TOPIC;
queueId = ScheduleMessageService.delayLevel2QueueId(msg.getDelayTimeLevel());
// Backup real topic, queueId
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));
msg.setTopic(topic);
msg.setQueueId(queueId);
}
}
InetSocketAddress bornSocketAddress = (InetSocketAddress) msg.getBornHost();
if (bornSocketAddress.getAddress() instanceof Inet6Address) {
msg.setBornHostV6Flag();
}
InetSocketAddress storeSocketAddress = (InetSocketAddress) msg.getStoreHost();
if (storeSocketAddress.getAddress() instanceof Inet6Address) {
msg.setStoreHostAddressV6Flag();
}
long eclipsedTimeInLock = 0;
MappedFile unlockMappedFile = null;
MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile(); //获取最后一个mappedFile
putMessageLock.lock(); //存储commitLog之前先获取锁
try {
long beginLockTimestamp = this.defaultMessageStore.getSystemClock().now();
this.beginTimeInLock = beginLockTimestamp;
// Here settings are stored timestamp, in order to ensure an orderly
// global
msg.setStoreTimestamp(beginLockTimestamp);
if (null == mappedFile || mappedFile.isFull()) {
mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise 创建新文件 first
}
if (null == mappedFile) {
log.error("create mapped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, null);
}
//重点! 这里是真正进行文件存储的地方
result = mappedFile.appendMessage(msg, this.appendMessageCallback);
switch (result.getStatus()) {
case PUT_OK:
break;
case END_OF_FILE: //这里表示当前的文件存储不下这个消息了 需要新创建文件
unlockMappedFile = mappedFile;
// Create a new file, re-write the message
mappedFile = this.mappedFileQueue.getLastMappedFile(0);
if (null == mappedFile) {
// XXX: warn and notify me
log.error("create mapped file2 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, result);
}
result = mappedFile.appendMessage(msg, this.appendMessageCallback);
break;
case MESSAGE_SIZE_EXCEEDED:
case PROPERTIES_SIZE_EXCEEDED:
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result);
case UNKNOWN_ERROR:
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result);
default:
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result);
}
eclipsedTimeInLock = this.defaultMessageStore.getSystemClock().now() - beginLockTimestamp;
beginTimeInLock = 0;
} finally {
putMessageLock.unlock(); //解锁操作
}
if (eclipsedTimeInLock > 500) { //时间太长打印日志
log.warn("[NOTIFYME]putMessage in lock cost time(ms)={}, bodyLength={} AppendMessageResult={}", eclipsedTimeInLock, msg.getBody().length, result);
}
if (null != unlockMappedFile && this.defaultMessageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
this.defaultMessageStore.unlockMappedFile(unlockMappedFile);
}
PutMessageResult putMessageResult = new PutMessageResult(PutMessageStatus.PUT_OK, result);
handleDiskFlush(result, putMessageResult, msg); //持久化
handleHA(result, putMessageResult, msg); //同步到从节点
return putMessageResult;
}
appendMessage:
public AppendMessageResult appendMessagesInner(final MessageExt messageExt, final AppendMessageCallback cb) {
assert messageExt != null;
assert cb != null;
int currentPos = this.wrotePosition.get();
if (currentPos < this.fileSize) {
//查看当前writeBuffer是否为空 如果不为空则证明msg开启了transient-pool-enable=true也就是当
//数据提交的时候将会被暂时保存在堆外内存 然后再提交到mappedByteBuffer然后进行落盘
ByteBuffer byteBuffer = writeBuffer != null ? writeBuffer.slice() : this.mappedByteBuffer.slice();
byteBuffer.position(currentPos);
AppendMessageResult result;
if (messageExt instanceof MessageExtBrokerInner) {
result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBrokerInner) messageExt);
} else if (messageExt instanceof MessageExtBatch) {
result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBatch) messageExt);
} else {
return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
}
this.wrotePosition.addAndGet(result.getWroteBytes());
this.storeTimestamp = result.getStoreTimestamp();
return result;
}
log.error("MappedFile.appendMessage return null, wrotePosition: {} fileSize: {}", currentPos, this.fileSize);
return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
}
doAppend(msg) 这个是消息最终进行存储到对应的文件缓冲区中的方法感兴趣的可以看一下,主要是进行消息的存储以及写指针的后移
那么我们存储在文件缓冲区的消息什么时候进行持久化呢? 我们可以看到上面在获取完AppendMessageResult之后会执行一个handleDiskFlush(result, putMessageResult, msg) 方法 这里就是进行持久化的操作
handleDiskFlush
这里是持久化相关的代码,主要做的事情:
1.判断当前配置的是同步刷盘还是异步刷盘:
1.1如果是同步刷盘的话就直接唤醒刷盘线程进行一个组提交并阻塞等待唤醒
2.如果是异步刷盘:
是否开启了堆外内存暂存 开启了需要打印flush进程 并先将writeBuffer中的msg写到fileChannel中在进行持久化
public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
// Synchronization flush
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
if (messageExt.isWaitStoreMsgOK()) {
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request); //唤醒工作线程进行组提交
boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout()); //等待刷盘结果
if (!flushOK) {
log.error("do groupcommit, wait for flush failed, topic: " + messageExt.getTopic() + " tags: " + messageExt.getTags()
+ " client address: " + messageExt.getBornHostString());
putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
}
} else {
service.wakeup();
}
}
// Asynchronous flush
else {
if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
flushCommitLogService.wakeup();
} else {
commitLogService.wakeup();
}
}
}
以上就是commitLog写入&持久化的介绍,其中有些持久化的细节,大家感兴趣的可以看一下看看rocketmq是怎么控制异步刷盘的频率以及对堆外内存的管理。