log定时任务
kafka在启动LogManager的时候,会同时启动三个定时调度任务,分别是:
- cleanupLogs:清理日志任务
- flushDirtyLogs:日志刷盘任务
- checkpointLogRecoveryOffsets
- checkpointLogStartOffsets
- deleteLogs
下面逐个介绍这些定时任务的具体作用
cleanupLogs
删除(过时)日志,主要分为两步:
- 寻找需要删除的日志
- 删除日志
寻找需要删除的日志
首先寻找non-compacted的日志,如果cleaner不为空,要再去除掉正在进行clean工作的日志。剩余的日志就是需要进行删除的日志,其LogCleaningPaused(i)标记为1.
删除日志log中符合条件的segment
如果log配置了delete策略,那么删除满足以下三个条件之一的segment
- startMs - segment.largestTimestamp > config.retentionMs:表示此segment中最大的timestamp距离现在时间超过了retentionMs,因此需要被删除
- 如果log size减去segment.size仍然大于配置的rention size,则删除之
- 如果segment的baseOffset小于logStartOffset,那么可以删除
如果log没有配置delete,只要segment满足上述条件3就可以删除。
寻找可以被删除的segment
定义了可以被删除的segment需要满足的条件后,就要找到所有要删除的segment
/**
* Find segments starting from the oldest until the user-supplied predicate is false or the segment
* containing the current high watermark is reached. We do not delete segments with offsets at or beyond
* the high watermark to ensure that the log start offset can never exceed it. If the high watermark
* has not yet been initialized, no segments are eligible for deletion.
*
* A final segment that is empty will never be returned (since we would just end up re-creating it).
*
* @param predicate A function that takes in a candidate log segment and the next higher segment
* (if there is one) and returns true iff it is deletable
* @return the segments ready to be deleted
*/
private def deletableSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean): Iterable[LogSegment] = {
if (segments.isEmpty || replicaHighWatermark.isEmpty) {
Seq.empty
} else {
val highWatermark = replicaHighWatermark.get
val deletable = ArrayBuffer.empty[LogSegment]
var segmentEntry = segments.firstEntry
while (segmentEntry != null) {
val segment = segmentEntry.getValue
val nextSegmentEntry = segments.higherEntry(segmentEntry.getKey)
val (nextSegment, upperBoundOffset, isLastSegmentAndEmpty) = if (nextSegmentEntry != null)
(nextSegmentEntry.getValue, nextSegmentEntry.getValue.baseOffset, false)
else
(null, logEndOffset, segment.size == 0)
//这三个条件同时满足时代表可以删除
if (highWatermark >= upperBoundOffset && predicate(segment, Option(nextSegment)) && !isLastSegmentAndEmpty) {
deletable += segment
segmentEntry = nextSegmentEntry
} else {
segmentEntry = null
}
}
deletable
}
}
删除segment
删除segment是一个异步的动作
- 第一步先从log中删除这个segment的entry
- 第二步是异步的从文件系统中将和这个segment相关的文件都删除
/**
* Delete this log segment from the filesystem.
*/
def deleteIfExists() {
def delete(delete: () => Boolean, fileType: String, file: File, logIfMissing: Boolean): Unit = {
try {
if (delete())
info(s"Deleted $fileType ${file.getAbsolutePath}.")
else if (logIfMissing)
info(s"Failed to delete $fileType ${file.getAbsolutePath} because it does not exist.")
}
catch {
case e: IOException => throw new IOException(s"Delete of $fileType ${file.getAbsolutePath} failed.", e)
}
}
CoreUtils.tryAll(Seq(
() => delete(log.deleteIfExists _, "log", log.file, logIfMissing = true),
() => delete(offsetIndex.deleteIfExists _, "offset index", offsetIndex.file, logIfMissing = true),
() => delete(timeIndex.deleteIfExists _, "time index", timeIndex.file, logIfMissing = true),
() => delete(txnIndex.deleteIfExists _, "transaction index", txnIndex.file, logIfMissing = false)
))
}
flushDirtyLogs
flushDirtyLogs任务定期遍历所有的log列表,如果某个log距离上次flush的时间超过了log.config.flushMs,就会进行刷盘操作。
flush
flush的范围是从上次刷盘的offset(recoveryPoint)开始直到LEO(LogEndOffset)为止
/**
* Flush log segments for all offsets up to offset-1
*
* @param offset The offset to flush up to (non-inclusive); the new recovery point
*/
def flush(offset: Long) : Unit = {
maybeHandleIOException(s"Error while flushing log for $topicPartition in dir ${dir.getParent} with offset $offset") {
//recoveryPoint代表第一个还未刷盘的offset,如果低于此offset表示已经刷过盘了
if (offset <= this.recoveryPoint)
return
debug(s"Flushing log up to offset $offset, last flushed: $lastFlushTime, current time: ${time.milliseconds()}, " +
s"unflushed: $unflushedMessages")
for (segment <- logSegments(this.recoveryPoint, offset))
segment.flush()
lock synchronized {
checkIfMemoryMappedBufferClosed()
if (offset > this.recoveryPoint) {
this.recoveryPoint = offset
lastFlushedTime.set(time.milliseconds)
}
}
}
}
具体的刷盘方式根据文件不同而不同
/**
* Flush this log segment to disk
*/
@threadsafe
def flush() {
LogFlushStats.logFlushTimer.time {
log.flush() //FileChannel.force(true)
offsetIndex.flush() //mmap.force()
timeIndex.flush() //mmap.force()
txnIndex.flush() //FileChannel.force()
}
}
刷完盘以后要更新新的recoveryPoint
lock synchronized {
checkIfMemoryMappedBufferClosed()
if (offset > this.recoveryPoint) {
this.recoveryPoint = offset
lastFlushedTime.set(time.milliseconds)
}
}
checkpointLogRecoveryOffsets
在上面flushDirtyLogs中提到,当将log刷盘时,会记录一个recoveryPoint用来后面恢复。在这个定时任务中,将这个log对应的recoveryPoint写入到对应的checkpoint文件中,同时删除producer state。
checkpoint
/**
* Make a checkpoint for all logs in provided directory.
*/
private def checkpointLogRecoveryOffsetsInDir(dir: File): Unit = {
for {
partitionToLog <- logsByDir.get(dir.getAbsolutePath)
checkpoint <- recoveryPointCheckpoints.get(dir)
} {
try {
checkpoint.write(partitionToLog.mapValues(_.recoveryPoint))
allLogs.foreach(_.deleteSnapshotsAfterRecoveryPointCheckpoint())
} catch {
case e: IOException =>
logDirFailureChannel.maybeAddOfflineLogDir(dir.getAbsolutePath, s"Disk error while writing to recovery point " +
s"file in directory $dir", e)
}
}
}
其中对每个log执行了deleteSnapshotsAfterRecoveryPointCheckpoint
deleteSnapshotsAfterRecoveryPointCheckpoint
kafka会保留一份从producerId到上一次写入的entry的映射。这个entry的内容包含了epoch,sequence number和last offset等信息,方便producer能够持续的写入日志。当这个producer上次写入的entry所在的segment被删除时,producer过期。
/**
* Cleanup old producer snapshots after the recovery point is checkpointed. It is useful to retain
* the snapshots from the recent segments in case we need to truncate and rebuild the producer state.
* Otherwise, we would always need to rebuild from the earliest segment.
*
* More specifically:
*
* 1. We always retain the producer snapshot from the last two segments. This solves the common case
* of truncating to an offset within the active segment, and the rarer case of truncating to the previous segment.
*
* 2. We only delete snapshots for offsets less than the recovery point. The recovery point is checkpointed
* periodically and it can be behind after a hard shutdown. Since recovery starts from the recovery point, the logic
* of rebuilding the producer snapshots in one pass and without loading older segments is simpler if we always
* have a producer snapshot for all segments being recovered.
*
* Return the minimum snapshots offset that was retained.
*/
def deleteSnapshotsAfterRecoveryPointCheckpoint(): Long = {
val minOffsetToRetain = minSnapshotsOffsetToRetain
producerStateManager.deleteSnapshotsBefore(minOffsetToRetain)
minOffsetToRetain
}
当对recoveryPoint做checkpoint到时候,同时会把旧的producer state snapshot删除,只会保留最近几个segment中的producer state(否则恢复的时候要从最早的segment开始恢复)。kafka至少会保留最近两个segment的producer snapshot和recoverPoint之后的snapshot。(ps.snapshot和segment的关系就是snapshot中的offset和segment的关系)。
获取最小保留的offset minSnapshotsOffsetToRetain
private[log] def minSnapshotsOffsetToRetain: Long = {
lock synchronized {
val twoSegmentsMinOffset = lowerSegment(activeSegment.baseOffset).getOrElse(activeSegment).baseOffset
// Prefer segment base offset
val recoveryPointOffset = lowerSegment(recoveryPoint).map(_.baseOffset).getOrElse(recoveryPoint)
math.min(recoveryPointOffset, twoSegmentsMinOffset)
}
}
checkpointLogStartOffsets
将log对应的startOffset写入到checkpoint文件中