not completed ~
在某些场合,我们给一个kafka record设置了key,但是相同的key,只有最新的offset的那条消息才是有用的,一些相同key的过时消息是冗余的,如果不删除,以后恢复的时候就会浪费时间。kafka针对这种topic有一种clean的策略,就是用来删除key冗余的消息。
kafka server在启动其LogManager的时候,会启动一些后台的任务,其中有一个任务就是做clean。这个clean job是由LogCleaner来完成的,在其注释中解释其任务是从logs中移除过时的记录。一个记录(key,offset)被称为是过时的,是指有一个相同key的消息,其offset'>offset。根据这种对message的划分,一个logs下面所有的segments可以被分成如下几个部分:
- clean section: 曾经被clean过的section
- dirty section: 还没有被clean过的section
- cleanable section:
- uncleanable section: 包含active segment,不能被clean
下面从几个角度来分析LogCleaner的工作流程
什么时候clean logs?
LogCleaner在启动的时候根据配置创建一些clean thread,由这些线程负责clean logs. CleanerThread继承自kafka.utils.ShutdownableThread,这个线程会不停的执行其doWork()方法直到线程被中断。CleanerThread的doWork()方法是
override def doWork() {
val cleaned = cleanFilthiestLog()
if (!cleaned)
pause(config.backOffMs, TimeUnit.MILLISECONDS)
}
其中cleanFilthiestLog()就是clean logs的主要逻辑了。 在回答下面三个问题之前,先来过一遍cleanFilthiestLog()方法的主要执行过程:
/**
* Cleans a log if there is a dirty log available
* @return whether a log was cleaned
*/
private def cleanFilthiestLog(): Boolean = {
var currentLog: Option[Log] = None
try {
//获取待清理log
val cleaned = cleanerManager.grabFilthiestCompactedLog(time) match {
case None =>
false
case Some(cleanable) =>
// there's a log, clean it
currentLog = Some(cleanable.log)
//执行clean
cleanLog(cleanable)
true
}
//获取待删除log
val deletable: Iterable[(TopicPartition, Log)] = cleanerManager.deletableLogs()
try {
deletable.foreach {
case (topicPartition, log) =>
try {
currentLog = Some(log)
//删除
log.deleteOldSegments()
}
}
} finally {
cleanerManager.doneDeleting(deletable.map(_._1))
}
cleaned
} catch {
case e @ (_: ThreadShutdownException | _: ControlThrowable) => throw e
case e: Exception =>
if (currentLog.isEmpty) {
throw new IllegalStateException("currentLog cannot be empty on an unexpected exception", e)
}
val erroneousLog = currentLog.get
warn(s"Unexpected exception thrown when cleaning log $erroneousLog. Marking its partition (${erroneousLog.topicPartition}) as uncleanable", e)
cleanerManager.markPartitionUncleanable(erroneousLog.dir.getParent, erroneousLog.topicPartition)
false
}
}
从以上代码看出,cleanFilthiestLog做了两件事情,一个是清理log,一个是删除log.
哪些logs需要被clean?
我们主要看cleanerManager.grabFilthiestCompactedLog(time)方法和cleanerManager.deletableLogs()方法。
grabFilthiestCompactedLog(time)
/**
* Choose the log to clean next and add it to the in-progress set. We recompute this
* each time from the full set of logs to allow logs to be dynamically added to the pool of logs
* the log manager maintains.
*/
def grabFilthiestCompactedLog(time: Time): Option[LogToClean] = {
inLock(lock) {
val now = time.milliseconds
this.timeOfLastRun = now
val lastClean = allCleanerCheckpoints
val dirtyLogs = logs.filter {
case (_, log) => log.config.compact // match logs that are marked as compacted
}.filterNot {
case (topicPartition, log) =>
// skip any logs already in-progress and uncleanable partitions
inProgress.contains(topicPartition) || isUncleanablePartition(log, topicPartition)
}.map {
case (topicPartition, log) => // create a LogToClean instance for each
val (firstDirtyOffset, firstUncleanableDirtyOffset) = LogCleanerManager.cleanableOffsets(log, topicPartition,
lastClean, now)
LogToClean(topicPartition, log, firstDirtyOffset, firstUncleanableDirtyOffset)
}.filter(ltc => ltc.totalBytes > 0) // skip any empty logs
this.dirtiestLogCleanableRatio = if (dirtyLogs.nonEmpty) dirtyLogs.max.cleanableRatio else 0
// and must meet the minimum threshold for dirty byte ratio
val cleanableLogs = dirtyLogs.filter(ltc => ltc.cleanableRatio > ltc.log.config.minCleanableRatio)
if(cleanableLogs.isEmpty) {
None
} else {
val filthiest = cleanableLogs.max
inProgress.put(filthiest.topicPartition, LogCleaningInProgress)
Some(filthiest)
}
}
}
寻找待清理log的过程就是构造cleanableLogs的过程。
- 首先先从logs列表中筛选出配置了compact的log
- 如果这个log(即TopicPartition)正在进行clean过程,或者不可以被清理,则跳过 对于上述过程寻找出来的log,计算dirty offset的范围
/**
* Returns the range of dirty offsets that can be cleaned.
*
* @param log the log
* @param lastClean the map of checkpointed offsets
* @param now the current time in milliseconds of the cleaning operation
* @return the lower (inclusive) and upper (exclusive) offsets
*/
def cleanableOffsets(log: Log, topicPartition: TopicPartition, lastClean: immutable.Map[TopicPartition, Long], now: Long): (Long, Long) = {
// the checkpointed offset, ie., the first offset of the next dirty segment
val lastCleanOffset: Option[Long] = lastClean.get(topicPartition)
// If the log segments are abnormally truncated and hence the checkpointed offset is no longer valid;
// reset to the log starting offset and log the error
val logStartOffset = log.logSegments.head.baseOffset
val firstDirtyOffset = {
val offset = lastCleanOffset.getOrElse(logStartOffset)
if (offset < logStartOffset) {
// don't bother with the warning if compact and delete are enabled.
if (!isCompactAndDelete(log))
warn(s"Resetting first dirty offset of ${log.name} to log start offset $logStartOffset since the checkpointed offset $offset is invalid.")
logStartOffset
} else {
offset
}
}
val compactionLagMs = math.max(log.config.compactionLagMs, 0L)
// find first segment that cannot be cleaned
// neither the active segment, nor segments with any messages closer to the head of the log than the minimum compaction lag time
// may be cleaned
val firstUncleanableDirtyOffset: Long = Seq(
// we do not clean beyond the first unstable offset
log.firstUnstableOffset.map(_.messageOffset),
// the active segment is always uncleanable
Option(log.activeSegment.baseOffset),
// the first segment whose largest message timestamp is within a minimum time lag from now
if (compactionLagMs > 0) {
// dirty log segments
val dirtyNonActiveSegments = log.logSegments(firstDirtyOffset, log.activeSegment.baseOffset)
dirtyNonActiveSegments.find { s =>
val isUncleanable = s.largestTimestamp > now - compactionLagMs
debug(s"Checking if log segment may be cleaned: log='${log.name}' segment.baseOffset=${s.baseOffset} segment.largestTimestamp=${s.largestTimestamp}; now - compactionLag=${now - compactionLagMs}; is uncleanable=$isUncleanable")
isUncleanable
}.map(_.baseOffset)
} else None
).flatten.min
debug(s"Finding range of cleanable offsets for log=${log.name} topicPartition=$topicPartition. Last clean offset=$lastCleanOffset now=$now => firstDirtyOffset=$firstDirtyOffset firstUncleanableOffset=$firstUncleanableDirtyOffset activeSegment.baseOffset=${log.activeSegment.baseOffset}")
(firstDirtyOffset, firstUncleanableDirtyOffset)
}
firstDirtyOffset是上次清理完后的最后offset.如果这个offset比所有segments的base offset都小,则设置为最小的base offset. firstUncleanableDirtyOffset则取
- log的第一个unstable offset
- active segment的base offset
- largestTimestamp小于now - compactionLagMs的segment的base offset
三者的最小值。 然后根据log,dirty offset构造出LogToClean对象。
private case class LogToClean(topicPartition: TopicPartition, log: Log, firstDirtyOffset: Long, uncleanableOffset: Long) extends Ordered[LogToClean]
在构造LogToClean对象的时候会算出cleanableRatio
val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size.toLong).sum
val (firstUncleanableOffset, cleanableBytes) = LogCleaner.calculateCleanableBytes(log, firstDirtyOffset, uncleanableOffset)
val totalBytes = cleanBytes + cleanableBytes
val cleanableRatio = cleanableBytes / totalBytes.toDouble
cleanableRatio就是指cleanable bytes在所有dirty bytes中所占的比例。
最后将所有的LogToClean根据cleanableRatio排序,取cleanableRatio最大的那个log并返回。
至此,需要被clean的那个log伴随着firstDirtyOffset和uncleanableOffset已经被找出来了。
deletableLogs()
如何clean log?
清理log的主要调用链为 cleanLog(cleanable: LogToClean) -> clean(cleanable: LogToClean) -> doClean(cleanable: LogToClean, deleteHorizonMs: Long) 下面逐一分析这些方法
cleanLog(cleanable: LogToClean)
private def cleanLog(cleanable: LogToClean): Unit = {
//在清理之前,先初始化endOffset
var endOffset = cleanable.firstDirtyOffset
try {
//清理log,返回第一个未清理的offset和这次清理的统计信息
val (nextDirtyOffset, cleanerStats) = cleaner.clean(cleanable)
recordStats(cleaner.id, cleanable.log.name, cleanable.firstDirtyOffset, endOffset, cleanerStats)
endOffset = nextDirtyOffset
} catch {
case _: LogCleaningAbortedException => // task can be aborted, let it go.
case _: KafkaStorageException => // partition is already offline. let it go.
case e: IOException =>
var logDirectory = cleanable.log.dir.getParent
val msg = s"Failed to clean up log for ${cleanable.topicPartition} in dir ${logDirectory} due to IOException"
logDirFailureChannel.maybeAddOfflineLogDir(logDirectory, msg, e)
} finally {
cleanerManager.doneCleaning(cleanable.topicPartition, cleanable.log.dir.getParentFile, endOffset)
}
}
clean(cleanable: LogToClean)
/**
* Clean the given log
*
* @param cleanable The log to be cleaned
*
* @return The first offset not cleaned and the statistics for this round of cleaning
*/
private[log] def clean(cleanable: LogToClean): (Long, CleanerStats) = {
// figure out the timestamp below which it is safe to remove delete tombstones
// this position is defined to be a configurable time beneath the last modified time of the last clean segment
val deleteHorizonMs =
cleanable.log.logSegments(0, cleanable.firstDirtyOffset).lastOption match {
case None => 0L
case Some(seg) => seg.lastModified - cleanable.log.config.deleteRetentionMs
}
doClean(cleanable, deleteHorizonMs)
}
在这个方法中,主要是算出一个时间叫做deleteHorizonMs。在kafka中,有一个概念叫做tombstones,kafka提出一种安全删除tombstones的时间机制,即如果tombstones所在的segment的lastModified大于deleteHorizonMs,就能够被删除,即
lastModified > deleteHorizonMs
=> lastModified > seg.lastModified - cleanable.log.config.deleteRetentionMs
=> seg.lastModified - lastModified < cleanable.log.config.deleteRetentionMs
在后面我们再介绍tombstones。
算出deleteHorizonMs后就开始调用doClean方法了
doClean(cleanable: LogToClean, deleteHorizonMs: Long)
private[log] def doClean(cleanable: LogToClean, deleteHorizonMs: Long): (Long, CleanerStats) = {
info("Beginning cleaning of log %s.".format(cleanable.log.name))
val log = cleanable.log
val stats = new CleanerStats()
// build the offset map
info("Building offset map for %s...".format(cleanable.log.name))
val upperBoundOffset = cleanable.firstUncleanableOffset
buildOffsetMap(log, cleanable.firstDirtyOffset, upperBoundOffset, offsetMap, stats)
val endOffset = offsetMap.latestOffset + 1
stats.indexDone()
// determine the timestamp up to which the log will be cleaned
// this is the lower of the last active segment and the compaction lag
val cleanableHorizonMs = log.logSegments(0, cleanable.firstUncleanableOffset).lastOption.map(_.lastModified).getOrElse(0L)
// group the segments and clean the groups
info("Cleaning log %s (cleaning prior to %s, discarding tombstones prior to %s)...".format(log.name, new Date(cleanableHorizonMs), new Date(deleteHorizonMs)))
for (group <- groupSegmentsBySize(log.logSegments(0, endOffset), log.config.segmentSize, log.config.maxIndexSize, cleanable.firstUncleanableOffset))
cleanSegments(log, group, offsetMap, deleteHorizonMs, stats)
// record buffer utilization
stats.bufferUtilization = offsetMap.utilization
stats.allDone()
(endOffset, stats)
}
doClean里面主要做了两件事情,第一件事是创建一个offsetMap。offsetMap的作用是从segment中找出哪些key需要被清理。
buildOffsetMap
/**
* Build a map of key_hash => offset for the keys in the cleanable dirty portion of the log to use in cleaning.
* @param log The log to use
* @param start The offset at which dirty messages begin
* @param end The ending offset for the map that is being built
* @param map The map in which to store the mappings
* @param stats Collector for cleaning statistics
*/
private[log] def buildOffsetMap(log: Log,
start: Long,
end: Long,
map: OffsetMap,
stats: CleanerStats) {
map.clear()
//所有dirty segments的集合
val dirty = log.logSegments(start, end).toBuffer
info("Building offset map for log %s for %d segments in offset range [%d, %d).".format(log.name, dirty.size, start, end))
val abortedTransactions = log.collectAbortedTransactions(start, end)
val transactionMetadata = CleanedTransactionMetadata(abortedTransactions)
// Add all the cleanable dirty segments. We must take at least map.slots * load_factor,
// but we may be able to fit more (if there is lots of duplication in the dirty section of the log)
var full = false
//在offsetmap满之前遍历所有segments
for (segment <- dirty if !full) {
checkDone(log.topicPartition)
full = buildOffsetMapForSegment(log.topicPartition, segment, map, start, log.config.maxMessageSize,
transactionMetadata, stats)
if (full)
debug("Offset map is full, %d segments fully mapped, segment with base offset %d is partially mapped".format(dirty.indexOf(segment), segment.baseOffset))
}
info("Offset map for log %s complete.".format(log.name))
}
最主要看buildOffsetMapForSegment方法
buildOffsetMapForSegment
/**
* Add the messages in the given segment to the offset map
*
* @param segment The segment to index
* @param map The map in which to store the key=>offset mapping
* @param stats Collector for cleaning statistics
*
* @return If the map was filled whilst loading from this segment
*/
private def buildOffsetMapForSegment(topicPartition: TopicPartition,
segment: LogSegment,
map: OffsetMap,
startOffset: Long,
maxLogMessageSize: Int,
transactionMetadata: CleanedTransactionMetadata,
stats: CleanerStats): Boolean = {
//先定位到startOffset在当前segment中对应的物理位置
var position = segment.offsetIndex.lookup(startOffset).position
//控制map的size
val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt
while (position < segment.log.sizeInBytes) {
checkDone(topicPartition)
readBuffer.clear()
try {
//从position开始将readBuffer读满
segment.log.readInto(readBuffer, position)
} catch {
case e: Exception =>
throw new KafkaException(s"Failed to read from segment $segment of partition $topicPartition " +
"while loading offset map", e)
}
//返回record batch
val records = MemoryRecords.readableRecords(readBuffer)
//限流用的,防止磁盘io过高
throttler.maybeThrottle(records.sizeInBytes)
val startPosition = position
for (batch <- records.batches.asScala) {
if (batch.isControlBatch) {
transactionMetadata.onControlBatchRead(batch)
stats.indexMessagesRead(1)
} else {
val isAborted = transactionMetadata.onBatchRead(batch)
if (isAborted) {
// If the batch is aborted, do not bother populating the offset map.
// Note that abort markers are supported in v2 and above, which means count is defined.
stats.indexMessagesRead(batch.countOrNull)
} else {
for (record <- batch.asScala) {
if (record.hasKey && record.offset >= startOffset) {
if (map.size < maxDesiredMapSize)
//相同的key保留的是最后出现的offset
map.put(record.key, record.offset)
else
return true
}
stats.indexMessagesRead(1)
}
}
}
if (batch.lastOffset >= startOffset)
//更新map中最新的offset
map.updateLatestOffset(batch.lastOffset)
}
val bytesRead = records.validBytes
position += bytesRead
stats.indexBytesRead(bytesRead)
// if we didn't read even one complete message, our read buffer may be too small
if(position == startPosition)
growBuffersOrFail(segment.log, position, maxLogMessageSize, records)
}
restoreBuffers()
false
}
}
经常上面的过程,所有遇到的key和其最后的offset都已经放入到offsetMap当中了。 第二件事就是将segments中的key清理掉。为了防止segment被清理过key以后的size太小,kafka先将所有的segments分组,再对每个segment group做清理工作
groupSegmentsBySize
这个方法比较简单,就是遍历所有segments加到一个group中,当group中的size超过一定条件时产生一个新的group。
/**
* Group the segments in a log into groups totaling less than a given size. the size is enforced separately for the log data and the index data.
* We collect a group of such segments together into a single
* destination segment. This prevents segment sizes from shrinking too much.
*
* @param segments The log segments to group
* @param maxSize the maximum size in bytes for the total of all log data in a group
* @param maxIndexSize the maximum size in bytes for the total of all index data in a group
*
* @return A list of grouped segments
*/
private[log] def groupSegmentsBySize(segments: Iterable[LogSegment], maxSize: Int, maxIndexSize: Int, firstUncleanableOffset: Long): List[Seq[LogSegment]] = {
var grouped = List[List[LogSegment]]()
var segs = segments.toList
while(segs.nonEmpty) {
var group = List(segs.head)
var logSize = segs.head.size.toLong
var indexSize = segs.head.offsetIndex.sizeInBytes.toLong
var timeIndexSize = segs.head.timeIndex.sizeInBytes.toLong
segs = segs.tail
while(segs.nonEmpty &&
logSize + segs.head.size <= maxSize &&
indexSize + segs.head.offsetIndex.sizeInBytes <= maxIndexSize &&
timeIndexSize + segs.head.timeIndex.sizeInBytes <= maxIndexSize &&
lastOffsetForFirstSegment(segs, firstUncleanableOffset) - group.last.baseOffset <= Int.MaxValue) {
group = segs.head :: group
logSize += segs.head.size
indexSize += segs.head.offsetIndex.sizeInBytes
timeIndexSize += segs.head.timeIndex.sizeInBytes
segs = segs.tail
}
grouped ::= group.reverse
}
grouped.reverse
}
最主要的方法是cleanSegments,它将一个group中所有的segment合并成一个segment。
cleanSegments
/**
* Clean a group of segments into a single replacement segment
*
* @param log The log being cleaned
* @param segments The group of segments being cleaned
* @param map The offset map to use for cleaning segments
* @param deleteHorizonMs The time to retain delete tombstones
* @param stats Collector for cleaning statistics
*/
private[log] def cleanSegments(log: Log,
segments: Seq[LogSegment],
map: OffsetMap,
deleteHorizonMs: Long,
stats: CleanerStats) {
// create a new segment with a suffix appended to the name of the log and indexes
val cleaned = LogCleaner.createNewCleanedSegment(log, segments.head.baseOffset)
try {
// clean segments into the new destination segment
val iter = segments.iterator
var currentSegmentOpt: Option[LogSegment] = Some(iter.next())
while (currentSegmentOpt.isDefined) {
val currentSegment = currentSegmentOpt.get
val nextSegmentOpt = if (iter.hasNext) Some(iter.next()) else None
val startOffset = currentSegment.baseOffset
val upperBoundOffset = nextSegmentOpt.map(_.baseOffset).getOrElse(map.latestOffset + 1)
val abortedTransactions = log.collectAbortedTransactions(startOffset, upperBoundOffset)
val transactionMetadata = CleanedTransactionMetadata(abortedTransactions, Some(cleaned.txnIndex))
val retainDeletes = currentSegment.lastModified > deleteHorizonMs
info(s"Cleaning segment $startOffset in log ${log.name} (largest timestamp ${new Date(currentSegment.largestTimestamp)}) " +
s"into ${cleaned.baseOffset}, ${if(retainDeletes) "retaining" else "discarding"} deletes.")
try {
cleanInto(log.topicPartition, currentSegment.log, cleaned, map, retainDeletes, log.config.maxMessageSize,
transactionMetadata, log.activeProducersWithLastSequence, stats)
} catch {
case e: LogSegmentOffsetOverflowException =>
// Split the current segment. It's also safest to abort the current cleaning process, so that we retry from
// scratch once the split is complete.
info(s"Caught segment overflow error during cleaning: ${e.getMessage}")
log.splitOverflowedSegment(currentSegment)
throw new LogCleaningAbortedException()
}
currentSegmentOpt = nextSegmentOpt
}
cleaned.onBecomeInactiveSegment()
// flush new segment to disk before swap
cleaned.flush()
// update the modification date to retain the last modified date of the original files
val modified = segments.last.lastModified
cleaned.lastModified = modified
// swap in new segment
info(s"Swapping in cleaned segment $cleaned for segment(s) $segments in log $log")
log.replaceSegments(List(cleaned), segments)
} catch {
case e: LogCleaningAbortedException =>
try cleaned.deleteIfExists()
catch {
case deleteException: Exception =>
e.addSuppressed(deleteException)
} finally throw e
}
}
cleaned是构造出来的一个新segment,遍历每一个segment,将其中的消息写入到cleaned中,方法是cleanInto
cleanInto
/**
* Clean the given source log segment into the destination segment using the key=>offset mapping
* provided
*
* @param topicPartition The topic and partition of the log segment to clean
* @param sourceRecords The dirty log segment
* @param dest The cleaned log segment
* @param map The key=>offset mapping
* @param retainDeletes Should delete tombstones be retained while cleaning this segment
* @param maxLogMessageSize The maximum message size of the corresponding topic
* @param stats Collector for cleaning statistics
*/
private[log] def cleanInto(topicPartition: TopicPartition,
sourceRecords: FileRecords,
dest: LogSegment,
map: OffsetMap,
retainDeletes: Boolean,
maxLogMessageSize: Int,
transactionMetadata: CleanedTransactionMetadata,
activeProducers: Map[Long, Int],
stats: CleanerStats) {
val logCleanerFilter = new RecordFilter {
var discardBatchRecords: Boolean = _
override def checkBatchRetention(batch: RecordBatch): BatchRetention = {
// we piggy-back on the tombstone retention logic to delay deletion of transaction markers.
// note that we will never delete a marker until all the records from that transaction are removed.
discardBatchRecords = shouldDiscardBatch(batch, transactionMetadata, retainTxnMarkers = retainDeletes)
// check if the batch contains the last sequence number for the producer. if so, we cannot
// remove the batch just yet or the producer may see an out of sequence error.
if (batch.hasProducerId && activeProducers.get(batch.producerId).contains(batch.lastSequence))
BatchRetention.RETAIN_EMPTY
else if (discardBatchRecords)
BatchRetention.DELETE
else
BatchRetention.DELETE_EMPTY
}
override def shouldRetainRecord(batch: RecordBatch, record: Record): Boolean = {
if (discardBatchRecords)
// The batch is only retained to preserve producer sequence information; the records can be removed
false
else
Cleaner.this.shouldRetainRecord(map, retainDeletes, batch, record, stats)
}
}
var position = 0
while (position < sourceRecords.sizeInBytes) {
checkDone(topicPartition)
// read a chunk of messages and copy any that are to be retained to the write buffer to be written out
readBuffer.clear()
writeBuffer.clear()
sourceRecords.readInto(readBuffer, position)
val records = MemoryRecords.readableRecords(readBuffer)
throttler.maybeThrottle(records.sizeInBytes)
val result = records.filterTo(topicPartition, logCleanerFilter, writeBuffer, maxLogMessageSize, decompressionBufferSupplier)
stats.readMessages(result.messagesRead, result.bytesRead)
stats.recopyMessages(result.messagesRetained, result.bytesRetained)
position += result.bytesRead
// if any messages are to be retained, write them out
val outputBuffer = result.outputBuffer
if (outputBuffer.position() > 0) {
outputBuffer.flip()
val retained = MemoryRecords.readableRecords(outputBuffer)
// it's OK not to hold the Log's lock in this case, because this segment is only accessed by other threads
// after `Log.replaceSegments` (which acquires the lock) is called
dest.append(largestOffset = result.maxOffset,
largestTimestamp = result.maxTimestamp,
shallowOffsetOfMaxTimestamp = result.shallowOffsetOfMaxTimestamp,
records = retained)
throttler.maybeThrottle(outputBuffer.limit())
}
// if we read bytes but didn't get even one complete batch, our I/O buffer is too small, grow it and try again
// `result.bytesRead` contains bytes from `messagesRead` and any discarded batches.
if (readBuffer.limit() > 0 && result.bytesRead == 0)
growBuffersOrFail(sourceRecords, position, maxLogMessageSize, records)
}
restoreBuffers()
}
cleanInto的主要步骤是
- 构造 RecordFilter,对于一个batch和record,能够返回retention的结果
public enum BatchRetention {
DELETE, // Delete the batch without inspecting records
RETAIN_EMPTY, // Retain the batch even if it is empty
DELETE_EMPTY // Delete the batch if it is empty
}
- 从position=0开始读取这个segment中的消息,写入到readBuffer
- 从readBuffer中读出所有的records,根据上面构造的RecordFilter判断是否清理,将需要保留的消息写入到writeBuffer中
- 调用segment的append方法,从writeBuffer将保留下来的消息写入
上面的过程其实比较清晰,因为我们过滤了其中的一些kafka的细节和代码。首先是保留过滤器RecordFilter的构造
val logCleanerFilter = new RecordFilter {
var discardBatchRecords: Boolean = _
override def checkBatchRetention(batch: RecordBatch): BatchRetention = {
// we piggy-back on the tombstone retention logic to delay deletion of transaction markers.
// note that we will never delete a marker until all the records from that transaction are removed.
discardBatchRecords = shouldDiscardBatch(batch, transactionMetadata, retainTxnMarkers = retainDeletes)
// check if the batch contains the last sequence number for the producer. if so, we cannot
// remove the batch just yet or the producer may see an out of sequence error.
if (batch.hasProducerId && activeProducers.get(batch.producerId).contains(batch.lastSequence))
BatchRetention.RETAIN_EMPTY
else if (discardBatchRecords)
BatchRetention.DELETE
else
BatchRetention.DELETE_EMPTY
}
override def shouldRetainRecord(batch: RecordBatch, record: Record): Boolean = {
if (discardBatchRecords)
// The batch is only retained to preserve producer sequence information; the records can be removed
false
else
Cleaner.this.shouldRetainRecord(map, retainDeletes, batch, record, stats)
}
}
过滤batch的逻辑牵扯到kafka事务的实现,暂时不做介绍,先看怎么过滤一个record:
private def shouldRetainRecord(map: kafka.log.OffsetMap,
retainDeletes: Boolean,
batch: RecordBatch,
record: Record,
stats: CleanerStats): Boolean = {
//offset比较新,则不用判断,直接通过
val pastLatestOffset = record.offset > map.latestOffset
if (pastLatestOffset)
return true
if (record.hasKey) {
val key = record.key
//这是dirty中出现过的最后offset
val foundOffset = map.get(key)
/* two cases in which we can get rid of a message:
* 1) if there exists a message with the same key but higher offset
* 2) if the message is a delete "tombstone" marker and enough time has passed
*/
val redundant = foundOffset >= 0 && record.offset < foundOffset
val obsoleteDelete = !retainDeletes && !record.hasValue
!redundant && !obsoleteDelete
} else {
stats.invalidMessage()
false
}
}
只有当redundant和obsoleteDelete都是false的时候,才会保留一个record.redundant是指一个record的key对应的offset太小,已经过时了,就能够删除了。obsoleteDelete为false当且仅当retainDeletes和record.hasValue至少有一个是true。如果hasValue=true,则不需要看tombstone保留的条件。否则需要retainDeletes为true时,才能够保留这条tombstone。如果record没有key,也不保留。