ingester
并发从kafka读取数据写入clickhouse。
kafka 并发至少一次消费的问题
- groutine 0 消费messgae0。goroutine 1 消费message 1000。 如果goroutine1消费完直接commitoffset,而message0尚未消费完。 此时系统崩溃,重启后从message 1001开始消费。
graph TB
offsetMarker(offsetMarker)--go-->offsetCommit(每100ms从list中获取highest continuous offset提交到给对应的partion)
offsetMarker(offsetMarker)-->markoffset(markoffset 将offset插入list)
ingester consumer流程
graph TB
Consumer(Consumer) --每一个partition开启一个goroutine--> handleMessages(handleMessages from msgCh)--> ParallelProcessor(ParallelProcessor 开1000个goroutine处理msg)
ParallelProcessor(ParallelProcessor 开1000个goroutine处理msg)--1--> DecoratedProcessor( DecoratedProcessor 记录每条span的处理时间)
ParallelProcessor(ParallelProcessor 开1000个goroutine处理msg)--1000--> DecoratedProcessor( DecoratedProcessor 记录每条msg的处理时间)
DecoratedProcessor( DecoratedProcessor 记录每条msg的处理时间)--> CommittingProcessor(CommittingProcessor 在处理完msg并且err不等于nil后使用offsetMarker markoffset)-->RetryingProcessor(RetryingProcessor 失败重试)-->KafkaSpanProcessor(KafkaSpanProcessor unmarshal msg to span)-->clickhouseSpanWriter(clickhouseSpanWriter 异步将span写入clickhouse)
Consumer(Consumer) --每一个partition开启一个goroutine--> handleErros(handleErros from errChan)
ingestery producer - 将消息投入asyncproducer.
- 开两个goroutine通过prometheus记录write success 和 write failure。
writeMetrics := spanWriterMetrics{
SpansWrittenSuccess: factory.Counter(metrics.Options{Name: "kafka_spans_written", Tags: map[string]string{"status": "success"}}),
SpansWrittenFailure: factory.Counter(metrics.Options{Name: "kafka_spans_written", Tags: map[string]string{"status": "failure"}}),
}
go func() {
for range producer.Successes() {
writeMetrics.SpansWrittenSuccess.Inc(1)
}
}()
go func() {
for e := range producer.Errors() {
if e != nil && e.Err != nil {
logger.Error(e.Err.Error())
}
writeMetrics.SpansWrittenFailure.Inc(1)
}
}()
// WriteSpan writes the span to kafka.
func (w *SpanWriter) WriteSpan(ctx context.Context, span *model.Span) error {
spanBytes, err := w.marshaller.Marshal(span)
if err != nil {
w.metrics.SpansWrittenFailure.Inc(1)
return err
}
// The AsyncProducer accepts messages on a channel and produces them asynchronously
// in the background as efficiently as possible
w.producer.Input() <- &sarama.ProducerMessage{
Topic: w.topic,
Key: sarama.StringEncoder(span.TraceID.String()),
Value: sarama.ByteEncoder(spanBytes),
}
return nil
}