kafka server - log相关的定时任务

1,042 阅读4分钟

log定时任务

kafka在启动LogManager的时候,会同时启动三个定时调度任务,分别是:

  • cleanupLogs:清理日志任务
  • flushDirtyLogs:日志刷盘任务
  • checkpointLogRecoveryOffsets
  • checkpointLogStartOffsets
  • deleteLogs

下面逐个介绍这些定时任务的具体作用

cleanupLogs

删除(过时)日志,主要分为两步:

  • 寻找需要删除的日志
  • 删除日志

寻找需要删除的日志

首先寻找non-compacted的日志,如果cleaner不为空,要再去除掉正在进行clean工作的日志。剩余的日志就是需要进行删除的日志,其LogCleaningPaused(i)标记为1.

删除日志log中符合条件的segment

如果log配置了delete策略,那么删除满足以下三个条件之一的segment

  • startMs - segment.largestTimestamp > config.retentionMs:表示此segment中最大的timestamp距离现在时间超过了retentionMs,因此需要被删除
  • 如果log size减去segment.size仍然大于配置的rention size,则删除之
  • 如果segment的baseOffset小于logStartOffset,那么可以删除

如果log没有配置delete,只要segment满足上述条件3就可以删除。

寻找可以被删除的segment

定义了可以被删除的segment需要满足的条件后,就要找到所有要删除的segment

/**
   * Find segments starting from the oldest until the user-supplied predicate is false or the segment
   * containing the current high watermark is reached. We do not delete segments with offsets at or beyond
   * the high watermark to ensure that the log start offset can never exceed it. If the high watermark
   * has not yet been initialized, no segments are eligible for deletion.
   *
   * A final segment that is empty will never be returned (since we would just end up re-creating it).
   *
   * @param predicate A function that takes in a candidate log segment and the next higher segment
   *                  (if there is one) and returns true iff it is deletable
   * @return the segments ready to be deleted
   */
  private def deletableSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean): Iterable[LogSegment] = {
    if (segments.isEmpty || replicaHighWatermark.isEmpty) {
      Seq.empty
    } else {
      val highWatermark = replicaHighWatermark.get
      val deletable = ArrayBuffer.empty[LogSegment]
      var segmentEntry = segments.firstEntry
      while (segmentEntry != null) {
        val segment = segmentEntry.getValue
        val nextSegmentEntry = segments.higherEntry(segmentEntry.getKey)
        val (nextSegment, upperBoundOffset, isLastSegmentAndEmpty) = if (nextSegmentEntry != null)
          (nextSegmentEntry.getValue, nextSegmentEntry.getValue.baseOffset, false)
        else
          (null, logEndOffset, segment.size == 0)
        //这三个条件同时满足时代表可以删除
        if (highWatermark >= upperBoundOffset && predicate(segment, Option(nextSegment)) && !isLastSegmentAndEmpty) {
          deletable += segment
          segmentEntry = nextSegmentEntry
        } else {
          segmentEntry = null
        }
      }
      deletable
    }
  }

删除segment

删除segment是一个异步的动作

  • 第一步先从log中删除这个segment的entry
  • 第二步是异步的从文件系统中将和这个segment相关的文件都删除
/**
 * Delete this log segment from the filesystem.
 */
def deleteIfExists() {
  def delete(delete: () => Boolean, fileType: String, file: File, logIfMissing: Boolean): Unit = {
    try {
      if (delete())
        info(s"Deleted $fileType ${file.getAbsolutePath}.")
      else if (logIfMissing)
        info(s"Failed to delete $fileType ${file.getAbsolutePath} because it does not exist.")
    }
    catch {
      case e: IOException => throw new IOException(s"Delete of $fileType ${file.getAbsolutePath} failed.", e)
    }
  }

  CoreUtils.tryAll(Seq(
    () => delete(log.deleteIfExists _, "log", log.file, logIfMissing = true),
    () => delete(offsetIndex.deleteIfExists _, "offset index", offsetIndex.file, logIfMissing = true),
    () => delete(timeIndex.deleteIfExists _, "time index", timeIndex.file, logIfMissing = true),
    () => delete(txnIndex.deleteIfExists _, "transaction index", txnIndex.file, logIfMissing = false)
  ))
}

flushDirtyLogs

flushDirtyLogs任务定期遍历所有的log列表,如果某个log距离上次flush的时间超过了log.config.flushMs,就会进行刷盘操作。

flush

flush的范围是从上次刷盘的offset(recoveryPoint)开始直到LEO(LogEndOffset)为止

/**
  * Flush log segments for all offsets up to offset-1
  *
  * @param offset The offset to flush up to (non-inclusive); the new recovery point
  */
 def flush(offset: Long) : Unit = {
   maybeHandleIOException(s"Error while flushing log for $topicPartition in dir ${dir.getParent} with offset $offset") {
     //recoveryPoint代表第一个还未刷盘的offset,如果低于此offset表示已经刷过盘了
     if (offset <= this.recoveryPoint)
       return
     debug(s"Flushing log up to offset $offset, last flushed: $lastFlushTime,  current time: ${time.milliseconds()}, " +
       s"unflushed: $unflushedMessages")
     for (segment <- logSegments(this.recoveryPoint, offset))
       segment.flush()

     lock synchronized {
       checkIfMemoryMappedBufferClosed()
       if (offset > this.recoveryPoint) {
         this.recoveryPoint = offset
         lastFlushedTime.set(time.milliseconds)
       }
     }
   }
 }

具体的刷盘方式根据文件不同而不同

/**
 * Flush this log segment to disk
 */
@threadsafe
def flush() {
  LogFlushStats.logFlushTimer.time {
    log.flush() //FileChannel.force(true)
    offsetIndex.flush()  //mmap.force()
    timeIndex.flush()  //mmap.force()
    txnIndex.flush()  //FileChannel.force()
  }
}

刷完盘以后要更新新的recoveryPoint

lock synchronized {
      checkIfMemoryMappedBufferClosed()
      if (offset > this.recoveryPoint) {
        this.recoveryPoint = offset
        lastFlushedTime.set(time.milliseconds)
      }
    }

checkpointLogRecoveryOffsets

在上面flushDirtyLogs中提到,当将log刷盘时,会记录一个recoveryPoint用来后面恢复。在这个定时任务中,将这个log对应的recoveryPoint写入到对应的checkpoint文件中,同时删除producer state。

checkpoint

/**
* Make a checkpoint for all logs in provided directory.
*/
private def checkpointLogRecoveryOffsetsInDir(dir: File): Unit = {
 for {
   partitionToLog <- logsByDir.get(dir.getAbsolutePath)
   checkpoint <- recoveryPointCheckpoints.get(dir)
 } {
   try {
     checkpoint.write(partitionToLog.mapValues(_.recoveryPoint))
     allLogs.foreach(_.deleteSnapshotsAfterRecoveryPointCheckpoint())
   } catch {
     case e: IOException =>
       logDirFailureChannel.maybeAddOfflineLogDir(dir.getAbsolutePath, s"Disk error while writing to recovery point " +
         s"file in directory $dir", e)
   }
 }
}

其中对每个log执行了deleteSnapshotsAfterRecoveryPointCheckpoint

deleteSnapshotsAfterRecoveryPointCheckpoint

kafka会保留一份从producerId到上一次写入的entry的映射。这个entry的内容包含了epoch,sequence number和last offset等信息,方便producer能够持续的写入日志。当这个producer上次写入的entry所在的segment被删除时,producer过期。

/**
 * Cleanup old producer snapshots after the recovery point is checkpointed. It is useful to retain
 * the snapshots from the recent segments in case we need to truncate and rebuild the producer state.
 * Otherwise, we would always need to rebuild from the earliest segment.
 *
 * More specifically:
 *
 * 1. We always retain the producer snapshot from the last two segments. This solves the common case
 * of truncating to an offset within the active segment, and the rarer case of truncating to the previous segment.
 *
 * 2. We only delete snapshots for offsets less than the recovery point. The recovery point is checkpointed
 * periodically and it can be behind after a hard shutdown. Since recovery starts from the recovery point, the logic
 * of rebuilding the producer snapshots in one pass and without loading older segments is simpler if we always
 * have a producer snapshot for all segments being recovered.
 *
 * Return the minimum snapshots offset that was retained.
 */
def deleteSnapshotsAfterRecoveryPointCheckpoint(): Long = {
  val minOffsetToRetain = minSnapshotsOffsetToRetain
  producerStateManager.deleteSnapshotsBefore(minOffsetToRetain)
  minOffsetToRetain
}

当对recoveryPoint做checkpoint到时候,同时会把旧的producer state snapshot删除,只会保留最近几个segment中的producer state(否则恢复的时候要从最早的segment开始恢复)。kafka至少会保留最近两个segment的producer snapshot和recoverPoint之后的snapshot。(ps.snapshot和segment的关系就是snapshot中的offset和segment的关系)。

获取最小保留的offset minSnapshotsOffsetToRetain
private[log] def minSnapshotsOffsetToRetain: Long = {
  lock synchronized {
    val twoSegmentsMinOffset = lowerSegment(activeSegment.baseOffset).getOrElse(activeSegment).baseOffset
    // Prefer segment base offset
    val recoveryPointOffset = lowerSegment(recoveryPoint).map(_.baseOffset).getOrElse(recoveryPoint)
    math.min(recoveryPointOffset, twoSegmentsMinOffset)
  }
}

checkpointLogStartOffsets

将log对应的startOffset写入到checkpoint文件中

deleteLogs