kafka server - LogManagerkafka的LogManager是kafka的日志管理系统。LogMa

LogManager

kafka的LogManager是kafka的日志管理系统。LogManager专门负责日志的创建，检索和清理。所有的日志读写请求都会经过它完成。下面看LogManager拥有哪些属性。

属性

logDirs: Seq[File]
initialOfflineDirs: Seq[File]
topicConfigs: Map[String,LogConfig]; 表示每个topic的日志管理相关的配置
initialDefaultConfig: LogConfig; 日志管理默认配置
cleanerConfig: CleanerConfig; 日志清理相关的配置
recoveryThreadsPerDataDir: Int; 恢复，加载每个日志目录时的线程数
flushCheckMs: Long
flushRecoveryOffsetCheckpointMs: Long
flushStartOffsetCheckpointMs: Long
retentionCheckMs: Long
maxPidExpirationMs: Int
scheduler: Scheduler; kafka自身和任务调度相关的工具类
brokerState: BrokerState; 表示一台broker的状态机
brokerTopicStats: BrokerTopicStats,
logDirFailureChannel: LogDirFailureChannel
time: Time; kafka自身一个时间相关的工具类

方法

构造方法

LogManager的构造方法相对比较简洁，最主要的是:

创建并验证提供的logDirs中的dir没有重复，如果目录不存在就创建目录
给logDirs中的每个目录加锁 //todo（实现）
logDir的每个目录下面创建一个recovery-point-offset-checkpoint文件，用来保存topic/partition=>offset的映射。//todo（用途）
logDir的每个目录下面创建一个log-start-offset-checkpoint文件，用来保存topic/partition=>offset的映射。 //todo（用途）
loadLogs()：加载目录中的日志文件，下面详细介绍

loadLogs()方法

private def loadLogs(): Unit = {
    info("Loading logs.")
    val startMs = time.milliseconds
    val threadPools = ArrayBuffer.empty[ExecutorService]
    val offlineDirs = mutable.Set.empty[(String, IOException)]
    val jobs = mutable.Map.empty[File, Seq[Future[_]]]

    for (dir <- liveLogDirs) {
      try {
        //每个log对应一个线程池执行加载工作
        val pool = Executors.newFixedThreadPool(numRecoveryThreadsPerDataDir)
        threadPools.append(pool)

        val cleanShutdownFile = new File(dir, Log.CleanShutdownFile)

        if (cleanShutdownFile.exists) {
          //正常shutdown，不需要执行恢复工作
          debug(s"Found clean shutdown file. Skipping recovery for all logs in data directory: ${dir.getAbsolutePath}")
        } else {
          jobs // log recovery itself is being performed by `Log` class during initialization
          brokerState.newState(RecoveringFromUncleanShutdown)
        }

        var recoveryPoints = Map[TopicPartition, Long]()
        try {
          recoveryPoints = this.recoveryPointCheckpoints(dir).read
        } catch {
          case e: Exception =>
            warn("Error occurred while reading recovery-point-offset-checkpoint file of directory " + dir, e)
            warn("Resetting the recovery checkpoint to 0")
        }

        var logStartOffsets = Map[TopicPartition, Long]()
        try {
          //读取每个log的startOffset
          logStartOffsets = this.logStartOffsetCheckpoints(dir).read
        } catch {
          case e: Exception =>
            warn("Error occurred while reading log-start-offset-checkpoint file of directory " + dir, e)
        }

        val jobsForDir = for {
          dirContent <- Option(dir.listFiles).toList
          logDir <- dirContent if logDir.isDirectory
        } yield {
          CoreUtils.runnable {
            try {
              //加载log的runnable任务
              loadLog(logDir, recoveryPoints, logStartOffsets)
            } catch {
              case e: IOException =>
                offlineDirs.add((dir.getAbsolutePath, e))
                error("Error while loading log dir " + dir.getAbsolutePath, e)
            }
          }
        }
        (cleanShutdownFile) = jobsForDir.map(pool.submit)
      } catch {
        case e: IOException =>
          offlineDirs.add((dir.getAbsolutePath, e))
          error("Error while loading log dir " + dir.getAbsolutePath, e)
      }
    }

    try {
      for ((cleanShutdownFile, dirJobs) <- jobs) {
        dirJobs.foreach(_.get)
        try {
          cleanShutdownFile.delete()
        } catch {
          case e: IOException =>
            offlineDirs.add((cleanShutdownFile.getParent, e))
            error(s"Error while deleting the clean shutdown file $cleanShutdownFile", e)
        }
      }

      offlineDirs.foreach { case (dir, e) =>
        logDirFailureChannel.maybeAddOfflineLogDir(dir, s"Error while deleting the clean shutdown file in dir $dir", e)
      }
    } catch {
      case e: ExecutionException =>
        error("There was an error in one of the threads during logs loading: " + e.getCause)
        throw e.getCause
    } finally {
      threadPools.foreach(_.shutdown())
    }

    info(s"Logs loading complete in ${time.milliseconds - startMs} ms.")
  }

对每个日志目录
- 创建线程池：val pool = Executors.newFixedThreadPool(numRecoveryThreadsPerDataDir)
- 如果存在.kafka_cleanshutdown文件，跳过恢复此目录下面的日志；如果不存在，将这个broker的状态设置为RecoveringFromUncleanShutdown
- 读取这个目录对应的recovery-point-offset-checkpoint文件。如果读取出错，将recovery checkpoint设置为0
- 读取这个目录下的log-start-offset-checkpoint文件
- 读取dir中的子目录列表
  - 执行loadLog(logDir, recoveryPoints, logStartOffsets)方法

loadLog方法

private def loadLog(logDir: File, recoveryPoints: Map[TopicPartition, Long], logStartOffsets: Map[TopicPartition, Long]): Unit = {
    debug("Loading log '" + logDir.getName + "'")
    val topicPartition = Log.parseTopicPartitionName(logDir)
    val config = topicConfigs.getOrElse(topicPartition.topic, currentDefaultConfig)
    val logRecoveryPoint = recoveryPoints.getOrElse(topicPartition, 0L)
    val logStartOffset = logStartOffsets.getOrElse(topicPartition, 0L)

    val log = Log(
      dir = logDir,
      config = config,
      logStartOffset = logStartOffset,
      recoveryPoint = logRecoveryPoint,
      maxProducerIdExpirationMs = maxPidExpirationMs,
      producerIdExpirationCheckIntervalMs = LogManager.ProducerIdExpirationCheckIntervalMs,
      scheduler = scheduler,
      time = time,
      brokerTopicStats = brokerTopicStats,
      logDirFailureChannel = logDirFailureChannel)

    if (logDir.getName.endsWith(Log.DeleteDirSuffix)) {
      addLogToBeDeleted(log)
    } else {
      val previous = {
        if (log.isFuture)
          this.futureLogs.put(topicPartition, log)
        else
          this.currentLogs.put(topicPartition, log)
      }
      if (previous != null) {
        if (log.isFuture)
          throw new IllegalStateException("Duplicate log directories found: %s, %s!".format(log.dir.getAbsolutePath, previous.dir.getAbsolutePath))
        else
          throw new IllegalStateException(s"Duplicate log directories for $topicPartition are found in both ${log.dir.getAbsolutePath} " +
            s"and ${previous.dir.getAbsolutePath}. It is likely because log directory failure happened while broker was " +
            s"replacing current replica with future replica. Recover broker from this failure by manually deleting one of the two directories " +
            s"for this partition. It is recommended to delete the partition in the log directory that is known to have failed recently.")
      }
    }
  }

loadLog的方法在于在系统内创建一个Log对象代表一个TopicPartition的文件合集。因此

loadLog第一步是根据子目录的名字获取到topic和partition
获取这个topicPartition相关的信息，包括配置，recovery checkpoint和log-start-offset-checkpoint。
构造Log对象
如果这个子目录以-delete结尾，就将其Log对象加入到待删除列表中
如果这个子目录以-future结尾，就将其Log对象加入到future列表中
否则将其Log对象加入到current列表中。 future列表的解释是，如果用户想在同一台broker上复制一个目录，则先创建一个以-future结尾的目录，等待追上当前日志的时候用future替换原来的日志文件。

Log

Log是一个LogSegment的序列，每个LogSegment都有一个起始offset。根据配置，当一个LogSegment的大小或者时间达到限制的时候就会创建出一个新的Segment。一个Log有一下的属性

属性

dir: File; 新的LogSegment文件会在此目录下创建
config: LogConfig
logStartOffset: Long; 暴露给kafka client的最早的offset。当用户删除日志，broker保留日志(?)，broker滚动日志的时候会更新。LogStartOffset的作用在于
- 删除日志；如果一个logSegment的nextOffset小于startOffset的时候，这个Segment就可以被删掉
- 作为响应返回给客户端
recoveryPoint: Long；需要开始做恢复的offset(在这个offset之前的日志都已被刷盘，在此之后的还没有)
scheduler: Scheduler
brokerTopicStats: BrokerTopicStats
time: Time
maxProducerIdExpirationMs: Int
producerIdExpirationCheckIntervalMs: Int
topicPartition: TopicPartition
producerStateManager: ProducerStateManager
logDirFailureChannel: LogDirFailureChannel

在创建一个Log的时候，会加载属于这个Log的Segment，方法是loadSegments

方法

loadSegments()

private def loadSegments(): Long = {
    // first do a pass through the files in the log directory and remove any temporary files
    // and find any interrupted swap operations
    val swapFiles = removeTempFilesAndCollectSwapFiles()

    // Now do a second pass and load all the log and index files.
    // We might encounter legacy log segments with offset overflow (KAFKA-6264). We need to split such segments. When
    // this happens, restart loading segment files from scratch.
    retryOnOffsetOverflow {
      // In case we encounter a segment with offset overflow, the retry logic will split it after which we need to retry
      // loading of segments. In that case, we also need to close all segments that could have been left open in previous
      // call to loadSegmentFiles().
      logSegments.foreach(_.close())
      segments.clear()
      loadSegmentFiles()
    }

    // Finally, complete any interrupted swap operations. To be crash-safe,
    // log files that are replaced by the swap segment should be renamed to .deleted
    // before the swap file is restored as the new segment file.
    //收集完swap文件后继续进行swap操作
    completeSwapOperations(swapFiles)

    if (logSegments.isEmpty) {
      // no existing segments, create a new mutable segment beginning at offset 0
      addSegment(LogSegment.open(dir = dir,
        baseOffset = 0,
        config,
        time = time,
        fileAlreadyExists = false,
        initFileSize = this.initFileSize,
        preallocate = config.preallocate))
      0
    } else if (!dir.getAbsolutePath.endsWith(Log.DeleteDirSuffix)) {
      val nextOffset = retryOnOffsetOverflow {
        recoverLog()
      }

      // reset the index size of the currently active log segment to allow more entries
      activeSegment.resizeIndexes(config.maxIndexSize)
      nextOffset
    } else 0
  }private def loadSegments(): Long = {
    // first do a pass through the files in the log directory and remove any temporary files
    // and find any interrupted swap operations
    val swapFiles = removeTempFilesAndCollectSwapFiles()

    // Now do a second pass and load all the log and index files.
    // We might encounter legacy log segments with offset overflow (KAFKA-6264). We need to split such segments. When
    // this happens, restart loading segment files from scratch.
    retryOnOffsetOverflow {
      // In case we encounter a segment with offset overflow, the retry logic will split it after which we need to retry
      // loading of segments. In that case, we also need to close all segments that could have been left open in previous
      // call to loadSegmentFiles().
      logSegments.foreach(_.close())
      segments.clear()
      loadSegmentFiles()
    }

    // Finally, complete any interrupted swap operations. To be crash-safe,
    // log files that are replaced by the swap segment should be renamed to .deleted
    // before the swap file is restored as the new segment file.
    completeSwapOperations(swapFiles)

    if (logSegments.isEmpty) {
      // no existing segments, create a new mutable segment beginning at offset 0
      addSegment(LogSegment.open(dir = dir,
        baseOffset = 0,
        config,
        time = time,
        fileAlreadyExists = false,
        initFileSize = this.initFileSize,
        preallocate = config.preallocate))
      0
    } else if (!dir.getAbsolutePath.endsWith(Log.DeleteDirSuffix)) {
      val nextOffset = retryOnOffsetOverflow {
        recoverLog()
      }

      // reset the index size of the currently active log segment to allow more entries
      activeSegment.resizeIndexes(config.maxIndexSize)
      nextOffset
    } else 0
  }

loadSegments的作用是从目录中加载这个Log的所有segments并返回下一个offset.过程是

removeTempFilesAndCollectSwapFiles(): 遍历一遍所有文件，清理临时文件，找到所有中断交换操作(?)；具体过程下面分析
再次遍历一遍所有文件，如果有segment的offset溢出，就将这个segment分割，并重新进行loadSegments操作（需要先将前面已经加载的segments close掉）。分割overflow的segment的方法是splitOverflowedSegment
继续完成前面找到的中断swap操作，用swap文件替换原文件; 将原文件标记为deleted状态
如果目录以-deleted结尾，则恢复日志，执行recoverLog()
如果logSegments为空，则新建一个LogSegment，设置baseOffset为0 下面分析在loadSegments方法中遇到过的其他方法；

removeTempFilesAndCollectSwapFiles

遍历目录下面的所有文件
如果文件以deleted结尾，则删除文件
如果文件以cleaned结尾，也要删除文件。cleaned文件是指文件在压缩过程中产生的中间文件
如果文件以swap结尾，先将swap后缀去除。如果原文件是index文件，则删除不需要。否则保存下来。

completeSwapOperations

LogSegment

LogSegment代表磁盘上一个物理的log文件和index文件。log文件包含了messag的内容，index是一个索引文件，它将offset映射到物理的文件地址。每个LogSegment都有一个基准offset。这个基准offset比上个segment里面的offset大，比他包含的所有message的offset小。LogSegment包含的两个文件的文件名是[base_offset].log和[base_offset].index 一个LogSegment包含如下的属性

属性

log: FileRecords; 包含所有的日志条目
offsetIndex: OffsetIndex
timeIndex: TimeIndex
txnIndex: TransactionIndex
baseOffset: Long
indexIntervalBytes: Int
rollJitterMs: Long
time: Time