在当今的分布式系统和微服务架构中,日志监控已成为保障系统稳定性的关键环节。一个高效的日志监控系统不仅能够帮助我们快速定位问题,还能提供宝贵的业务洞察。本文将带你从零开始,构建一个现代化的实时日志监控系统,涵盖架构设计、核心组件实现和最佳实践。
为什么需要实时日志监控?
在传统的日志处理方式中,开发人员通常需要登录服务器、查找日志文件、使用grep等命令进行分析。这种方式在单体应用时代尚可接受,但在微服务架构下,服务数量可能达到数十甚至上百个,传统方式变得力不从心。
实时日志监控系统能够:
- 集中收集:将分散在各个服务器、容器中的日志统一收集
- 实时处理:毫秒级延迟的日志处理和分析
- 智能分析:基于规则的告警和异常检测
- 可视化展示:直观的仪表盘和查询界面
系统架构设计
我们的系统将采用经典的ELK(Elasticsearch, Logstash, Kibana)技术栈,并加入现代化的改进:
┌─────────────────┐ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ 日志源 │───▶│ Filebeat │───▶│ Logstash │───▶│ Elasticsearch│
│ (应用/服务) │ │ (收集器) │ │ (处理器) │ │ (存储/搜索) │
└─────────────────┘ └─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Kibana │
│ (可视化) │
└─────────────┘
现代化改进:
- 使用Fluentd替代Logstash,资源消耗更低
- 引入Kafka作为消息队列,提高系统可靠性
- 添加Grafana作为补充可视化工具
核心组件实现
1. 日志收集器(Filebeat)配置
Filebeat是一个轻量级的日志数据收集器,专门设计用于将日志文件中的数据发送到Logstash或Elasticsearch。
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/application/*.log
fields:
app: "web-application"
environment: "production"
fields_under_root: true
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
output.logstash:
hosts: ["logstash:5044"]
ssl.enabled: false
2. 日志处理器(Logstash)管道配置
Logstash负责过滤、解析和转换日志数据。
# logstash.conf
input {
beats {
port => 5044
}
}
filter {
# 解析JSON格式日志
if [message] =~ /^{.*}$/ {
json {
source => "message"
}
}
# 解析常见的日志格式
grok {
match => {
"message" => [
"%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}",
"\[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:message}"
]
}
overwrite => ["message"]
}
# 日期解析
date {
match => ["timestamp", "ISO8601", "yyyy-MM-dd HH:mm:ss,SSS"]
target => "@timestamp"
}
# 用户代理解析
if [user_agent] {
useragent {
source => "user_agent"
target => "user_agent_info"
}
}
# 地理信息解析(如果有IP地址)
if [client_ip] {
geoip {
source => "client_ip"
target => "geoip"
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logs-%{+YYYY.MM.dd}"
user => "elastic"
password => "${ELASTIC_PASSWORD}"
}
# 同时输出到控制台用于调试
stdout {
codec => rubydebug
}
}
3. Elasticsearch索引模板
为了优化日志存储和查询性能,我们需要定义索引模板。
PUT _template/logs-template
{
"index_patterns": ["logs-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "30s",
"index.lifecycle.name": "logs_policy",
"index.lifecycle.rollover_alias": "logs"
},
"mappings": {
"dynamic_templates": [
{
"strings_as_keyword": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256
}
}
}
],
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"loglevel": {
"type": "keyword"
},
"app": {
"type": "keyword"
},
"environment": {
"type": "keyword"
},
"geoip": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
4. 使用Docker Compose部署
# docker-compose.yml
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.10.0
environment:
- discovery.type=single-node
- ES_JAVA_OPTS=-Xms1g -Xmx1g
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- "9200:9200"
networks:
- elk
logstash:
image: docker.elastic.co/logstash/logstash:8.10.0
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
- ./logstash.yml:/usr/share/logstash/config/logstash.yml
environment:
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
ports:
- "5044:5044"
depends_on:
- elasticsearch
networks:
- elk
kibana:
image: docker.elastic.co/kibana/kibana:8.10.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=${ELASTIC_PASSWORD}
ports:
- "5601:5601"
depends_on:
- elasticsearch
networks:
- elk
filebeat:
image: docker.elastic.co/beats/filebeat:8.10.0
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml
- /var/log/application:/var/log/application:ro
depends_on:
- logstash
networks:
- elk
volumes:
elasticsearch-data:
driver: local
networks:
elk:
driver: bridge
高级功能实现
1. 实时告警系统
使用Elasticsearch的Watcher功能实现实时告警:
PUT _watcher/watch/error_log_alert
{
"trigger": {
"schedule": {
"interval": "1m"
}
},
"input": {
"search": {
"request": {
"indices": ["logs-*"],
"body": {
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-1m"
}
}
},
{
"term": {
"loglevel": "ERROR"
}
}
]
}
},
"aggs": {
"apps": {
"