RocketMQ消息存储：恢复与删除本文主要介绍RocketMQ文件恢复机制，CommitLog消息写入后异步转发到Co

本文主要介绍RocketMQ文件恢复机制，CommitLog消息写入后异步转发到ConsumeQueue中，可能会出现不一致情况，需要一套恢复机制进行处理这种情况。另外RocketMQ消息存储到文件，存在过期的情况，也需要有机制进行删除。

1. 文件恢复

RokcetMQ存储先将消息全量存储到CommitLog文件，然后异步生成转发任务写入到ConsumeQueue和IndexFile文件。如果消息存储到CommitLog成功，但是转发没有成功执行，此时服务器Broker因某个原因宕机，会导致CommitLog、ConsumeQueue和IndexFile文件数据不一致。

如果不进行修复的话，会有一部分消息在CommitLog中存在，但是没有转发到ConsumeQueue，这部分消息将永远不会被消费者消费。可以在存储文件加载流程中，CommitLog和ConsumeQueue会最终一致性。

首先会判断上一次是否正常退出。其机制就是在启动的时会创建${ROCKET_HOME}/store/abort文件，在退出的时候通过JVM的ShutdownHook删除abort文件。如果下一次启动的时候存在abort文件，说明Broker是异常退出的，CommitLog和ConsumeQueue数据可能不一致。

//DefaultMessageStore#load
boolean lastExitOK = !this.isTempFileExist();

//DefaultMessageStore#isTempFileExist
private boolean isTempFileExist() {
    String fileName = StorePathConfigHelper.getAbortFile(this.messageStoreConfig.getStorePathRootDir());
    File file = new File(fileName);
    return file.exists();
}

1.1 正常恢复

Broker正常停止文件恢复的实现为CommitLog中的recoverNormally方法。

checkCRCOnRecover参数为文件恢复时，查找消息时会读取消息内容计算CRC值，与存储时的CRC值是否匹配，默认会进行CRC验证。默认恢复从倒数第三个文件开始恢复，如果文件不足3个，则从第一个文件开始恢复。

//CommitLog#recoverNormally
boolean checkCRCOnRecover = this.defaultMessageStore.getMessageStoreConfig().isCheckCRCOnRecover();
final List<MappedFile> mappedFiles = this.mappedFileQueue.getMappedFiles();
if (!mappedFiles.isEmpty()) {
    // Began to recover from the last third file
    int index = mappedFiles.size() - 3;
    if (index < 0)
        index = 0;
}

processOffset为CommitLog文件已确认的物理偏移量，等于MapedFile的文件起始偏移量（从文件名中获取），加上mappedFileOffset。再遍历完后，会使用该变量做逻辑判断，可以说，这个变量是文件校验过后的最大物理偏移量。

mappedFileOffset为当前文件已校验通过的offset，替换一个文件会重新计数。

//CommitLog#recoverNormally
MappedFile mappedFile = mappedFiles.get(index);
ByteBuffer byteBuffer = mappedFile.sliceByteBuffer();
long processOffset = mappedFile.getFileFromOffset();
long mappedFileOffset = 0;

会遍历CommitLog，每次取出一条消息，取消息有几种情况。

如果查找结果成功，并且消息长度大于0，表示消息正常，mappedFileOffset累加消息的大小。
如果查找结果成功，并且消息长度等于0，表示已经到了文件的尾部，如果还有下一个文件，则重置processOffset和mappedFileOffset，继续遍历下一个文件。
如果查找结果失败，表明该文件未填满消息，进度恢复到当前位置，结束遍历。

//CommitLog#recoverNormally
DispatchRequest dispatchRequest = this.checkMessageAndReturnSize(byteBuffer, checkCRCOnRecover);
int size = dispatchRequest.getMsgSize();
// Normal data
if (dispatchRequest.isSuccess() && size > 0) {
    mappedFileOffset += size;
}
// Come the end of the file, switch to the next file Since the
// return 0 representatives met last hole,
// this can not be included in truncate offset
else if (dispatchRequest.isSuccess() && size == 0) {
    index++;
    if (index >= mappedFiles.size()) {
        break;
    } else {
        mappedFile = mappedFiles.get(index);
        byteBuffer = mappedFile.sliceByteBuffer();
        processOffset = mappedFile.getFileFromOffset();
        mappedFileOffset = 0;
        log.info("recover next physics file, " + mappedFile.getFileName());
    }
}
// Intermediate file read error
else if (!dispatchRequest.isSuccess()) {
    log.info("recover physics file end, " + mappedFile.getFileName());
    break;
}

更新MappedFileQueue的刷新和提交偏移量。

//CommitLog#recoverNormally
processOffset += mappedFileOffset;
this.mappedFileQueue.setFlushedWhere(processOffset);
this.mappedFileQueue.setCommittedWhere(processOffset);

需要删除校验offset之后的数据，会遍历所有文件，如果文件末尾offset小于校验offset，说明该文件都是被校验过的，不需要处理。文件末尾offset大于校验offset时，有两种情况：

校验offset大于文件起始offset：说明当前文件包含了校验offset，那么重新设置刷新和提交进度即可。
文件的起始offset大于校验offset：说明这个文件是在校验offset之后创建，这个文件需要进行删除。会调用MappedFile的destroy逻辑，加入待删除文件列表，最终调用deleteExpiredFile方法进行内存中移除，这些逻辑和后续章节中文件删除类似。

//MappedFileQueue#truncateDirtyFiles
public void truncateDirtyFiles(long offset) {
    List<MappedFile> willRemoveFiles = new ArrayList<MappedFile>();

    for (MappedFile file : this.mappedFiles) {
        long fileTailOffset = file.getFileFromOffset() + this.mappedFileSize;
        if (fileTailOffset > offset) {
            if (offset >= file.getFileFromOffset()) {
                file.setWrotePosition((int) (offset % this.mappedFileSize));
                file.setCommittedPosition((int) (offset % this.mappedFileSize));
                file.setFlushedPosition((int) (offset % this.mappedFileSize));
            } else {
                file.destroy(1000);
                willRemoveFiles.add(file);
            }
        }
    }

    this.deleteExpiredFile(willRemoveFiles);
}

最后还会判断ConsumeQueue中最大的offset和校验offset进行比较，如果ConsumeQueue中的Offset比较大，则需要删除ConsumeQueue中多出的offset数据。大致逻辑为遍历每一个ConsumeQueue，读取每一条记录，判断里面的pyhOffset。如果整个文件大于，则整个文件删除。文件包含校验offset的话，则设置刷新和提交的偏移量即可。

//CommitLog#recoverNormally
if (maxPhyOffsetOfConsumeQueue >= processOffset) {
    this.defaultMessageStore.truncateDirtyLogicFiles(processOffset);
}

1.2 异常恢复

异常恢复的逻辑为CommitLog的recoverAbnormally方法，整体上和正常恢复的流程类似，有个区别，异常恢复需要从最后一个文件往前遍历，找到第一个消息存储正常的文件。

消息文件是正常的判断有一下逻辑：

判断第一条消息的魔数，如果不是MESSAGE_MAGIC_CODE，则不符合消息存储格式。
判断第一条消息的存储时间，如果为0，则不符合消息存储格式。
默认情况下，如果第一条消息存储时间，小于文件检测点最小时间（取CommitLog和ConsumeQueue中最小的刷盘时间），则认为消息文件是正常。

//CommitLog#isMappedFileMatchedRecover
private boolean isMappedFileMatchedRecover(final MappedFile mappedFile) {
    ByteBuffer byteBuffer = mappedFile.sliceByteBuffer();

    int magicCode = byteBuffer.getInt(MessageDecoder.MESSAGE_MAGIC_CODE_POSTION);
    if (magicCode != MESSAGE_MAGIC_CODE) {
        return false;
    }

    long storeTimestamp = byteBuffer.getLong(MessageDecoder.MESSAGE_STORE_TIMESTAMP_POSTION);
    if (0 == storeTimestamp) {
        return false;
    }
	//默认messageIndexEnable=true，messageIndexSafe=false
    if (this.defaultMessageStore.getMessageStoreConfig().isMessageIndexEnable()
        && this.defaultMessageStore.getMessageStoreConfig().isMessageIndexSafe()) {
        if (storeTimestamp <= this.defaultMessageStore.getStoreCheckpoint().getMinTimestampIndex()) {
            log.info("find check timestamp, {} {}",
                storeTimestamp,
                UtilAll.timeMillisToHumanString(storeTimestamp));
            return true;
        }
    } else {
        if (storeTimestamp <= this.defaultMessageStore.getStoreCheckpoint().getMinTimestamp()) {
            log.info("find check timestamp, {} {}",
                storeTimestamp,
                UtilAll.timeMillisToHumanString(storeTimestamp));
            return true;
        }
    }

    return false;
}

找到MappedFile后，会遍历里面的消息，验证消息合法性，这个步骤和正常恢复类似。并且会将消息重新转发到消息队列与索引文件。虽然消息队列文件内部有判断偏移量，但是仍有可能在恢复前后，消费相同的消息。

这样体现了RocketMQ的整体设计思想：RocketMQ保证消息不丢失，但是不保证消息不重复消费，因此消费方需要实现消息消费的幂等。

如果没有找到正确的MappedFile，则把刷新和提交位置都设置为0，并销毁所有消息队列文件。内部会调用MappedFileQueue的destory方法，将消息队列目录下的所有文件都删除。

//CommitLog#recoverAbnormally
this.mappedFileQueue.setFlushedWhere(0);
this.mappedFileQueue.setCommittedWhere(0);
this.defaultMessageStore.destroyLogics();

2. 文件删除

RokcetMQ操作CommitLog、ConsumeQueue文件是基于内存映射机制，并且在启动的时候会加载commitlog、consumequeue目录下的所有文件，为了避免内存与磁盘的浪费，不可能建消息永久存储在服务器上，所以引入删除过期文件的机制。

RoketMQ顺序写CommitLog、ConsumeQueue文件，所有写操作全部落在最后一个文件上，之前的文件在下一个文件创建后就不会更新了。因此RocketMQ删除过期文件的方法：非当前写文件，在一定时间内没有再次被更新，则认为是过期文件可以进行删除，RocketMQ不会关注这个文件上的消息是否全部被消费。默认每个文件的过期时间为72小时，可以通过Broker的fileReservedTime进行修改。

在Broker启动的时候，会增加定时任务进行周期性检测文件是否过期，默认情况下10s检测一次。

//DefaultMessageStore#addScheduleTask
private void addScheduleTask() {

    this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
        @Override
        public void run() {
            DefaultMessageStore.this.cleanFilesPeriodically();
        }
    }, 1000 * 60, this.messageStoreConfig.getCleanResourceInterval(), TimeUnit.MILLISECONDS);
}

cleanFilesPeriodically内部会分别检测CommitLog和ConsumeQueue文件。

//DefaultMessageStore#cleanFilesPeriodically
private void cleanFilesPeriodically() {
    this.cleanCommitLogService.run();
    this.cleanConsumeQueueService.run();
}

run方法会删除过期文件，然后清理内存数据。

//DefaultMessageStore$CleanCommitLogService#run
public void run() {
    try {
        this.deleteExpiredFiles();

        this.redeleteHangedFile();
    } catch (Throwable e) {
        DefaultMessageStore.log.warn(this.getServiceName() + " service has exception. ", e);
    }
}

fileReservedTime：文件保留时间，最后一次更新时间到现在，如果超过该时间认为是过期文件

deletePhysicFilesInterval：删除物理文件的间隔，因为一次清理过程，可能需要删除多个文件，该值指定间隔进行删除。

destroyMapedFileIntervalForcibly：在清理过期文件时，如果该文件被其他线程占用（如读取消息，引用次数大于0），此时会阻止删除任务，同时第一次尝试删除该文件时，记录当前时间戳，destroyMapedFileIntervalForcibly表示第一次拒绝删除后，能保留的最大时间，在此时间内，仍然可以被拒绝删除，同时会将引用减少1000个，超过该时间后，文件会被强制删除。

//DefaultMessageStore$CleanCommitLogService#deleteExpiredFiles
//默认值：72小时
long fileReservedTime = DefaultMessageStore.this.getMessageStoreConfig().getFileReservedTime();
//默认值：100ms
int deletePhysicFilesInterval = DefaultMessageStore.this.getMessageStoreConfig().getDeleteCommitLogFilesInterval();
//默认值：120s
int destroyMapedFileIntervalForcibly = DefaultMessageStore.this.getMessageStoreConfig().getDestroyMapedFileIntervalForcibly();

满足以下三种情况之一，将继续执行删除文件操作：

到了指定删除文件的时间点，通过配置deleteWhen，默认是凌晨4点。
磁盘空间是否充足，如果磁盘空间不充足，也会执行删除过期文件。当磁盘使用率超过85%会立马执行删除过期文件，但磁盘使用率超过90%，则还会将磁盘设置为不可写，此时会拒绝新消息写入。
手动触发，预留操作，当前没有入口操作。

boolean timeup = this.isTimeToDelete();
boolean spacefull = this.isSpaceToDelete();
boolean manualDelete = this.manualDeleteFileSeveralTimes > 0;
if (timeup || spacefull || manualDelete) {
    //执行删除
}

执行文件销毁，会从倒数第二个文件开始遍历，计算文件的最大存活时间（最后一次更新时间 + 文件存活时间默认72小时），如果当前时间比文件的最大存活时间大，或者需要强制删除文件（磁盘使用率超过85%，默认情况）。则会执行MappedFile的destory方法，清除MappedFile占有的相关资源，内部会执行File#delete方法从物理磁盘中删除。如果执行成功，则加入文件列表，最后在内存中的映射数据删除掉。

//MappedFileQueue#deleteExpiredFileByTime
for (int i = 0; i < mfsLength; i++) {
    MappedFile mappedFile = (MappedFile) mfs[i];
    long liveMaxTimestamp = mappedFile.getLastModifiedTimestamp() + expiredTime;
    if (System.currentTimeMillis() >= liveMaxTimestamp || cleanImmediately) {
        if (mappedFile.destroy(intervalForcibly)) {
            files.add(mappedFile);
            deleteCount++;

            if (files.size() >= DELETE_FILES_BATCH_MAX) {
                break;
            }

            if (deleteFilesInterval > 0 && (i + 1) < mfsLength) {
                try {
                    Thread.sleep(deleteFilesInterval);
                } catch (InterruptedException e) {
                }
            }
        } 
    }

}

3. 参考资料

RocketMQ源码 4.4.0分支
《RocketMQ技术内幕》