日志收集架构
背景:
公司目前一些运营的数据是以定时以定时写文件的方式写入到零散日志里,再同步归并到某台主机,最后归并数据运算
现有技术方案问题:
- 重复的io:浪费网络和硬盘io。本机写一次,并且要同步,最后归并都是重复浪费网络和硬盘io
- 运维成本高和研发成本高: 因为要同步零散的文件,小文件太多随机io太多,并且如果统计数据有时效要求,可能在时间内数据同步不完整,因此研发和运维要为确保数据完整而增加技术校验,如果漏 了数据,研发要重新拉数据来统计。
- 数据零散,不是结构化,管理成本高。
改造方案: filebeat + kafka + 消费脚本,后期可改成 业务代码input --> kafka --> 消费
以下是相关服务的配置demo
filebata 收集日志
filebeat.inputs:
- type: log
enabled: true
paths:
- /data/logs/**/*
- /data/testlog/*.log
output.kafka:
hosts: ["x.x.x.31:9092"]
topic: test_collect1
keep_alive: 10s
required_acks: 1
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 1
kafka
broker.id=0
listeners=PLAINTEXT://x.x.x.31:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=127.0.0.1:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
消费脚本
#!/usr/bin/env python2.7
# -*- conding: utf-8 -*-
# @Time : 2020/10/22 9:45
# @Author : ada
# @file : consumer.py.py
# @Project: python-script
import json
from kafka import KafkaConsumer
#consumer = KafkaConsumer('test_collect',group_id='handa1',auto_offset_reset='earliest', bootstrap_servers=['x.x.0.31:9092'])
consumer = KafkaConsumer('test_collect',group_id='handa1',auto_offset_reset='lasted', bootstrap_servers=['x.x.0.31:9092'])
for msg in consumer:
print type(msg)
print type(msg.value)
data = json.loads(msg.value)
print type(data)
print data.keys()
print data['message']
es消费配置
es 配置
cluster.name: my-application
node.name: node-1
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.31
http.port: 9200
cluster.initial_master_nodes: ["node-1"]
http:
cors:
enabled: True
allow-origin: "*"
kibana配置
server.host: "x.x.x.31"
server.name: "collect-test"
elasticsearch.hosts: ["http://x.x.x.31:9200"]
kibana.index: "kibana"