日志管理新姿势:ELK/EFK 栈搭建入门实践

52 阅读3分钟

1. 环境准备与规划

1.1 系统要求

在开始部署之前,需要准备以下环境:

  • 操作系统:Ubuntu 20.04 LTS 或 CentOS 8
  • 内存:至少 8GB RAM
  • 存储:至少 20GB 可用磁盘空间
  • Java:OpenJDK 11 或更高版本

1.2 架构设计

我们将部署一个完整的 ELK 栈,包含以下组件:

  • Elasticsearch:日志存储和索引
  • Logstash:日志收集和处理
  • Kibana:日志可视化和分析
graph TD
    A[应用服务器] -->|发送日志| B[Logstash]
    B -->|处理数据| C[Elasticsearch]
    C -->|存储索引| D[Kibana]
    D -->|可视化| E[用户]
    F[系统日志] -->|Filebeat收集| B
    G[网络设备] -->|Syslog| B
    
    style A fill:#4CAF50,stroke:#388E3C
    style B fill:#2196F3,stroke:#1976D2
    style C fill:#FF9800,stroke:#F57C00
    style D fill:#9C27B0,stroke:#7B1FA2
    style E fill:#607D8B,stroke:#455A64
    style F fill:#795548,stroke:#5D4037
    style G fill:#009688,stroke:#00796B

2. 安装 Java 环境

2.1 安装 OpenJDK

创建安装脚本文件:install_java.sh

#!/bin/bash

# 更新系统包管理器
sudo apt update && sudo apt upgrade -y

# 安装 OpenJDK 11
sudo apt install -y openjdk-11-jdk

# 验证 Java 安装
java -version
javac -version

# 设置 JAVA_HOME 环境变量
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.bashrc
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc

# 重新加载环境变量
source ~/.bashrc

# 验证环境变量
echo $JAVA_HOME

执行安装脚本:

chmod +x install_java.sh
./install_java.sh

3. 安装 Elasticsearch

3.1 添加 Elasticsearch 仓库

创建安装脚本:install_elasticsearch.sh

#!/bin/bash

# 导入 Elasticsearch GPG 密钥
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

# 添加 Elasticsearch 仓库
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

# 更新包列表
sudo apt update

# 安装 Elasticsearch
sudo apt install -y elasticsearch

# 创建 Elasticsearch 数据目录
sudo mkdir -p /var/lib/elasticsearch
sudo chown -R elasticsearch:elasticsearch /var/lib/elasticsearch

3.2 配置 Elasticsearch

创建配置文件:/etc/elasticsearch/elasticsearch.yml

# 集群名称
cluster.name: my-elk-cluster

# 节点名称
node.name: elk-node-1

# 数据存储路径
path.data: /var/lib/elasticsearch

# 日志存储路径
path.logs: /var/log/elasticsearch

# 网络绑定地址
network.host: 0.0.0.0

# HTTP 端口
http.port: 9200

# 集群初始主节点
cluster.initial_master_nodes: ["elk-node-1"]

# 发现设置
discovery.type: single-node

# 内存锁定设置
bootstrap.memory_lock: true

# 跨域设置,便于 Kibana 访问
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization
http.cors.allow-methods: OPTIONS,HEAD,GET,POST,PUT,DELETE

3.3 配置 JVM 选项

创建 JVM 配置文件:/etc/elasticsearch/jvm.options.d/heap_size.options

# JVM 堆内存设置
-Xms2g
-Xmx2g

# GC 设置
-XX:+UseG1GC
-XX:MaxGCPauseMillis=400
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

3.4 启动 Elasticsearch 服务

#!/bin/bash

# 重新加载 systemd 配置
sudo systemctl daemon-reload

# 启用 Elasticsearch 服务
sudo systemctl enable elasticsearch

# 启动 Elasticsearch 服务
sudo systemctl start elasticsearch

# 检查服务状态
sudo systemctl status elasticsearch

# 查看服务日志
sudo journalctl -u elasticsearch -f

3.5 验证 Elasticsearch 安装

# 等待 Elasticsearch 启动
sleep 30

# 测试 Elasticsearch 是否正常运行
curl -X GET "localhost:9200/"

# 查看集群健康状态
curl -X GET "localhost:9200/_cluster/health?pretty"

# 查看节点信息
curl -X GET "localhost:9200/_cat/nodes?v"

# 查看索引
curl -X GET "localhost:9200/_cat/indices?v"

4. 安装 Logstash

4.1 安装 Logstash

创建安装脚本:install_logstash.sh

#!/bin/bash

# 安装 Logstash
sudo apt install -y logstash

# 创建 Logstash 配置目录
sudo mkdir -p /etc/logstash/conf.d
sudo chown -R logstash:logstash /etc/logstash

# 创建 Logstash 数据目录
sudo mkdir -p /var/lib/logstash
sudo chown -R logstash:logstash /var/lib/logstash

4.2 配置 Logstash 管道

创建输入配置文件:/etc/logstash/conf.d/01-input.conf

# 输入配置 - 从多个来源收集日志
input {
  # 从文件收集日志
  file {
    path => ["/var/log/syslog", "/var/log/auth.log"]
    type => "system"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
  
  # 从 TCP 端口收集日志
  tcp {
    port => 5000
    type => "syslog"
  }
  
  # 从 UDP 端口收集日志
  udp {
    port => 5001
    type => "syslog"
  }
  
  # 从 Beats 收集日志
  beats {
    port => 5044
    type => "beats"
  }
}

创建过滤器配置文件:/etc/logstash/conf.d/02-filter.conf

# 过滤器配置 - 处理和分析日志数据
filter {
  # 处理系统日志
  if [type] == "system" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
  
  # 处理 syslog 消息
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
  
  # 处理 beats 数据
  if [type] == "beats" {
    # 可以添加特定的 beats 过滤器
  }
  
  # 通用处理 - 解析时间戳
  date {
    match => [ "timestamp", "ISO8601" ]
  }
  
  # 添加地理信息(如果日志包含 IP 地址)
  geoip {
    source => "clientip"
  }
}

创建输出配置文件:/etc/logstash/conf.d/03-output.conf

# 输出配置 - 将处理后的数据发送到 Elasticsearch
output {
  # 输出到 Elasticsearch
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
    document_type => "%{type}"
    template => "/etc/logstash/elasticsearch-template.json"
    template_name => "logstash"
    template_overwrite => true
  }
  
  # 同时输出到标准输出(用于调试)
  stdout {
    codec => rubydebug
  }
}

4.3 创建 Elasticsearch 模板

创建模板文件:/etc/logstash/elasticsearch-template.json

{
  "template": "logstash-*",
  "version": 60001,
  "settings": {
    "index.refresh_interval": "5s",
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "mapping": {
              "norms": false,
              "type": "text"
            },
            "match_mapping_type": "string"
          }
        },
        {
          "string_fields": {
            "mapping": {
              "norms": false,
              "type": "text",
              "fields": {
                "keyword": {
                  "ignore_above": 256,
                  "type": "keyword"
                }
              }
            },
            "match_mapping_type": "string",
            "match": "*"
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "@version": {
          "type": "keyword"
        },
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "latitude": {
              "type": "half_float"
            },
            "location": {
              "type": "geo_point"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        }
      }
    }
  }
}

4.4 启动 Logstash 服务

#!/bin/bash

# 启用 Logstash 服务
sudo systemctl enable logstash

# 启动 Logstash 服务
sudo systemctl start logstash

# 检查服务状态
sudo systemctl status logstash

# 查看服务日志
sudo journalctl -u logstash -f

# 测试 Logstash 配置
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/

5. 安装 Kibana

5.1 安装 Kibana

创建安装脚本:install_kibana.sh

#!/bin/bash

# 安装 Kibana
sudo apt install -y kibana

# 创建 Kibana 数据目录
sudo mkdir -p /var/lib/kibana
sudo chown -R kibana:kibana /var/lib/kibana

5.2 配置 Kibana

创建配置文件:/etc/kibana/kibana.yml

# 服务端口
server.port: 5601

# 服务绑定地址
server.host: "0.0.0.0"

# Elasticsearch 连接地址
elasticsearch.hosts: ["http://localhost:9200"]

# Kibana 索引
kibana.index: ".kibana"

# 默认语言
i18n.locale: "zh-CN"

# 日志设置
logging.dest: /var/log/kibana/kibana.log

# 运行模式
server.name: "elk-server"

# 数据路径
path.data: /var/lib/kibana

# 监控设置
monitoring.ui.container.elasticsearch.enabled: true

# 安全设置(可选)
# elasticsearch.username: "kibana_system"
# elasticsearch.password: "your_password"

5.3 启动 Kibana 服务

#!/bin/bash

# 启用 Kibana 服务
sudo systemctl enable kibana

# 启动 Kibana 服务
sudo systemctl start kibana

# 检查服务状态
sudo systemctl status kibana

# 查看服务日志
sudo journalctl -u kibana -f

# 等待 Kibana 启动
sleep 60

# 测试 Kibana 是否正常运行
curl -X GET "http://localhost:5601"

6. 配置系统日志收集

6.1 安装和配置 Filebeat

创建安装脚本:install_filebeat.sh

#!/bin/bash

# 安装 Filebeat
sudo apt install -y filebeat

# 创建 Filebeat 数据目录
sudo mkdir -p /var/lib/filebeat
sudo chown -R root:root /var/lib/filebeat

创建 Filebeat 配置文件:/etc/filebeat/filebeat.yml

# Filebeat 配置
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/syslog
    - /var/log/auth.log
  fields:
    type: system
  fields_under_root: true

- type: log
  enabled: true
  paths:
    - /var/log/*.log
  fields:
    type: application
  fields_under_root: true

# 输出到 Logstash
output.logstash:
  hosts: ["localhost:5044"]

# 设置模板
setup.template.enabled: false

# Kibana 连接设置
setup.kibana:
  host: "localhost:5601"

# 监控设置
monitoring.enabled: true

6.2 启动 Filebeat 服务

#!/bin/bash

# 设置 Filebeat 模板
sudo filebeat setup --template -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'

# 导入 Kibana 仪表板
sudo filebeat setup -e -E output.logstash.enabled=false -E output.elasticsearch.hosts=['localhost:9200'] -E setup.kibana.host=localhost:5601

# 启用 Filebeat 服务
sudo systemctl enable filebeat

# 启动 Filebeat 服务
sudo systemctl start filebeat

# 检查服务状态
sudo systemctl status filebeat

# 测试 Filebeat 配置
sudo filebeat test config
sudo filebeat test output

7. 配置防火墙和安全性

7.1 配置防火墙规则

创建防火墙配置脚本:configure_firewall.sh

#!/bin/bash

# 启用 UFW 防火墙
sudo ufw --force enable

# 允许 SSH 连接
sudo ufw allow ssh

# 允许 Elasticsearch 端口
sudo ufw allow 9200/tcp

# 允许 Kibana 端口
sudo ufw allow 5601/tcp

# 允许 Logstash 输入端口
sudo ufw allow 5000/tcp
sudo ufw allow 5001/udp
sudo ufw allow 5044/tcp

# 查看防火墙状态
sudo ufw status verbose

7.2 基本安全配置

创建安全配置脚本:security_setup.sh

#!/bin/bash

# 创建 Elasticsearch 用户密码
echo "配置 Elasticsearch 安全设置..."
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto

# 备份配置文件
sudo cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.backup
sudo cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.backup

# 启用 Elasticsearch 安全功能
echo "xpack.security.enabled: true" | sudo tee -a /etc/elasticsearch/elasticsearch.yml

# 重启 Elasticsearch 使安全设置生效
sudo systemctl restart elasticsearch
sleep 30

# 配置 Kibana 使用安全连接
echo "elasticsearch.username: kibana_system" | sudo tee -a /etc/kibana/kibana.yml
echo "elasticsearch.password: 您生成的密码" | sudo tee -a /etc/kibana/kibana.yml

# 重启 Kibana
sudo systemctl restart kibana

8. 验证和测试 ELK 栈

8.1 服务状态检查

创建验证脚本:verify_elk.sh

#!/bin/bash

echo "=== 检查 ELK 栈服务状态 ==="

# 检查 Elasticsearch
echo "1. 检查 Elasticsearch..."
curl -X GET "localhost:9200" || echo "Elasticsearch 未运行"

# 检查 Kibana
echo "2. 检查 Kibana..."
curl -X GET "http://localhost:5601" || echo "Kibana 未运行"

# 检查 Logstash
echo "3. 检查 Logstash..."
sudo systemctl is-active logstash

# 检查 Filebeat
echo "4. 检查 Filebeat..."
sudo systemctl is-active filebeat

echo "=== 检查索引 ==="
curl -X GET "localhost:9200/_cat/indices?v"

echo "=== 检查集群健康 ==="
curl -X GET "localhost:9200/_cluster/health?pretty"

echo "=== 发送测试日志 ==="
# 发送测试日志到 Logstash
echo "$(date): ELK 栈测试日志消息" | nc localhost 5000

# 等待日志被处理
sleep 10

echo "=== 验证日志是否被索引 ==="
curl -X GET "localhost:9200/logstash-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "测试日志"
    }
  }
}'

8.2 Kibana 配置测试

创建 Kibana 配置脚本:setup_kibana.sh

#!/bin/bash

echo "=== 配置 Kibana 索引模式 ==="

# 等待 Kibana 完全启动
sleep 30

# 创建索引模式(通过 API)
curl -X POST "localhost:5601/api/saved_objects/index-pattern/logstash" \
  -H 'kbn-xsrf: true' \
  -H 'Content-Type: application/json' \
  -d '{
    "attributes": {
      "title": "logstash-*",
      "timeFieldName": "@timestamp"
    }
  }' || echo "索引模式可能已存在"

echo "=== Kibana 配置完成 ==="
echo "请访问 http://你的服务器IP:5601 查看 Kibana 仪表板"

9. 创建监控和告警

9.1 ELK 栈监控配置

创建监控脚本:elk_monitoring.sh

#!/bin/bash

# 创建监控目录
sudo mkdir -p /etc/elk-monitoring

# 创建 Elasticsearch 监控配置
cat << EOF | sudo tee /etc/elk-monitoring/elasticsearch-monitoring.json
{
  "monitoring": {
    "collection": {
      "enabled": true,
      "interval": 10s
    },
    "cluster": {
      "alerts": {
        "enabled": true
      }
    }
  }
}
EOF

# 创建系统监控脚本
cat << 'EOF' | sudo tee /etc/elk-monitoring/system-monitor.sh
#!/bin/bash

# 系统资源监控
echo "=== 系统资源状态 ==="
echo "内存使用:"
free -h

echo "磁盘使用:"
df -h

echo "CPU 使用:"
top -bn1 | grep "Cpu(s)"

echo "=== ELK 服务状态 ==="
sudo systemctl status elasticsearch | grep Active
sudo systemctl status logstash | grep Active
sudo systemctl status kibana | grep Active
sudo systemctl status filebeat | grep Active

echo "=== Elasticsearch 集群状态 ==="
curl -s -X GET "localhost:9200/_cluster/health?pretty" | grep -E "status|node_count|shards"

echo "=== 磁盘空间警告 ==="
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
  echo "警告: 磁盘使用率超过 80%"
fi
EOF

sudo chmod +x /etc/elk-monitoring/system-monitor.sh

# 设置定时监控任务
echo "*/5 * * * * root /etc/elk-monitoring/system-monitor.sh >> /var/log/elk-monitoring.log" | sudo tee /etc/cron.d/elk-monitoring

10. 故障排除和维护

10.1 常见问题解决

创建故障排除指南:troubleshooting_guide.sh

#!/bin/bash

echo "=== ELK 栈故障排除指南 ==="

# 检查服务日志函数
check_service_logs() {
    local service_name=$1
    echo "检查 $service_name 日志:"
    sudo journalctl -u $service_name --lines=10 --no-pager
}

# 检查 Elasticsearch
echo "1. Elasticsearch 检查:"
check_service_logs elasticsearch

# 检查索引状态
echo "索引状态:"
curl -s -X GET "localhost:9200/_cat/indices?v&s=index"

# 检查 Logstash
echo "2. Logstash 检查:"
check_service_logs logstash

# 测试 Logstash 配置
echo "测试 Logstash 配置:"
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/

# 检查 Kibana
echo "3. Kibana 检查:"
check_service_logs kibana

# 检查文件权限
echo "4. 文件权限检查:"
sudo ls -la /var/lib/elasticsearch/
sudo ls -la /var/log/elasticsearch/

# 检查端口监听
echo "5. 端口监听检查:"
sudo netstat -tulpn | grep -E "9200|5601|5000|5044"

# 检查系统资源
echo "6. 系统资源检查:"
free -h
df -h

10.2 维护脚本

创建维护脚本:elk_maintenance.sh

#!/bin/bash

echo "=== ELK 栈维护脚本 ==="

# 备份 Elasticsearch 索引
backup_indices() {
    local backup_path="/var/backups/elasticsearch"
    sudo mkdir -p $backup_path
    echo "创建 Elasticsearch 快照..."
    curl -X PUT "localhost:9200/_snapshot/elk_backup" -H 'Content-Type: application/json' -d'
    {
        "type": "fs",
        "settings": {
            "location": "'$backup_path'"
        }
    }'
    curl -X PUT "localhost:9200/_snapshot/elk_backup/snapshot_$(date +%Y%m%d_%H%M%S)?wait_for_completion=true"
}

# 清理旧日志索引
cleanup_old_indices() {
    echo "清理 30 天前的旧索引..."
    curl -X DELETE "localhost:9200/logstash-$(date -d '30 days ago' +%Y.%m.%d)"
}

# 优化索引
optimize_indices() {
    echo "优化索引..."
    curl -X POST "localhost:9200/_forcemerge?max_num_segments=1"
}

case "$1" in
    backup)
        backup_indices
        ;;
    cleanup)
        cleanup_old_indices
        ;;
    optimize)
        optimize_indices
        ;;
    *)
        echo "用法: $0 {backup|cleanup|optimize}"
        exit 1
        ;;
esac

11. 数据流程图

以下是 ELK 栈中数据流的完整处理过程:

flowchart TD
    A[日志源] --> B[Filebeat/Beats]
    C[Syslog] --> D[Logstash Input]
    E[应用程序] --> F[直接发送到 Logstash]
    
    B --> D
    F --> D
    
    D --> G[Logstash Filter<br/>解析/丰富/转换]
    G --> H[Logstash Output]
    
    H --> I[Elasticsearch<br/>索引和存储]
    I --> J[Kibana 仪表板]
    I --> K[Kibana 发现]
    I --> L[Kibana 可视化]
    
    J --> M[用户查询和分析]
    K --> M
    L --> M
    
    style A fill:#4CAF50,stroke:#388E3C
    style B fill:#2196F3,stroke:#1976D2
    style C fill:#FF9800,stroke:#F57C00
    style D fill:#9C27B0,stroke:#7B1FA2
    style G fill:#E91E63,stroke:#C2185B
    style H fill:#673AB7,stroke:#512DA8
    style I fill:#3F51B5,stroke:#303F9F
    style J fill:#00BCD4,stroke:#0097A7
    style K fill:#009688,stroke:#00796B
    style L fill:#8BC34A,stroke:#689F38
    style M fill:#FFC107,stroke:#FFA000
    style E fill:#795548,stroke:#5D4037
    style F fill:#607D8B,stroke:#455A64

12. 总结

通过以上详细的步骤,我们已经成功部署了一个完整的 ELK 栈。这个解决方案提供了:

  1. 集中式日志管理:从多个来源收集和存储日志
  2. 实时处理:使用 Logstash 进行实时日志处理和分析
  3. 强大的搜索:通过 Elasticsearch 实现快速全文搜索
  4. 可视化分析:使用 Kibana 创建丰富的仪表板和可视化
  5. 可扩展架构:能够轻松扩展以处理更大的日志量