日志采集:
| 流程 | 要做的事情 |
|---|---|
| 日志规范 | 固定字段定义 日志格式 |
| 日志采集 | 落盘规则 滚动策略 采集方法 |
| 日志传输 | 消息队列 消费方式 Topic规范 保存时间 |
| 日志切分 | 采样 过滤 自定格式 |
| 日志检索 | 索引分割 分片设置 检索优化 权限设置 保存时间 |
| 日志流监控 | 采集异常 传输异常 检索异常 不合规范 监控报警 |
日志目标:
logback的filter
这部分与使用者是强相关的,作用就是在日志落盘前起一个过滤的作用
咋说呢,就是接入到日志平台之后,你发现有些条件日志不想落盘,那么可以在filter里使用规则过滤掉
不配,使用默认值
logback的rollingPolicy
目前先统一一个按照时间滚动的,大部分的业务对于日志的使用都基于时间这个主要维度,基于量的这个维度很少
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${log.home}/archives/你的日志名.%d{yyyy-MM-dd}.log.gz</fileNamePattern>
<maxHistory>10</maxHistory>
</rollingPolicy>
logback的logger
当前的使用中logback的配置会尽可能的贴近业务自身,所以会在应用中配置多个logger指定输出到特定的appender日志文件下
日志平台会将所有符合规范的日志采集至统一查询内,因此多个logger的additivity尽量配置为false
例如:
<logger name="io.netty.util.ResourceLeakDetector" level="INFO" additivity="false">
<appender-ref ref="NETTY_MEM_LEAK" />
</logger>
filebeat的配置参数
版本:artifacts.elastic.co/downloads/b…
| 配置 | 作用 | 日志平台配置 |
|---|---|---|
| General:通用配置www.elastic.co/guide/en/be… | ||
| fields | 向输出的每一条日志添加额外的信息,比如“level:debug”,方便后续对日志进行分组统计。默认情况下,会在输出信息的fields子目录下以指定的新增fields建立子目录. | |
| filebeat.registry.path | 记录filebeat处理日志文件的位置的文件 | /data/docker-persist/${CONTAINER_IP_ADDR}/registry |
| inputs:日志采集配置www.elastic.co/guide/en/be… | ||
| paths | 指定要监控的日志,目前按照Go语言的glob函数处理。没有对配置目录做递归处理。 | /data/docker-persist//logs/.log |
| encoding | 指定被监控的文件的编码类型,使用plain和utf-8都是可以处理中文日志的。 | -- |
| input_type | 指定文件的输入类型log(默认)或者stdin。 | log |
| exclude_lines | 在输入中排除符合正则表达式列表的那些行。 | -- |
| include_lines | 包含输入中符合正则表达式列表的那些行(默认包含所有行),include_lines执行完毕之后会执行exclude_lines。 | -- |
| exclude_files | 忽略掉符合正则表达式列表的文件(默认为每一个符合paths定义的文件都创建一个harvester)。 | [``".gz$"``,``"[0-9]{2}.log$"``] |
| fields_under_root | 如果该选项设置为true,则新增fields成为顶级目录,而不是将其放在fields目录下。自定义的field会覆盖filebeat默认的field。 | -- |
| ignore_older | 可以指定Filebeat忽略指定时间段以外修改的日志内容,比如2h(两个小时)或者5m(5分钟)。 | -- |
| close_older | 如果一个文件在某个时间段内没有发生过更新,则关闭监控的文件handle。默认1h。 | 1h |
| force_close_files | Filebeat会在没有到达close_older之前一直保持文件的handle,如果在这个时间窗内删除文件会有问题,所以可以把force_close_files设置为true,只要filebeat检测到文件名字发生变化,就会关掉这个handle。 | false |
| scan_frequency | Filebeat以多快的频率去prospector指定的目录下面检测文件更新(比如是否有新增文件),如果设置为0s,则Filebeat会尽可能快地感知更新(占用的CPU会变高)。默认是10s。 | 10s |
| document_type | 设定Elasticsearch输出时的document的type字段,也可以用来给日志进行分类。 | |
| harvester_buffer_size | 每个harvester监控文件时,使用的buffer的大小。 | 65536 |
| max_bytes | 日志文件中增加一行算一个日志事件,max_bytes限制在一次日志事件中最多上传的字节数,多出的字节会被丢弃。 | 104857600 |
| multiline | 适用于日志中每一条日志占据多行的情况,比如各种语言的报错信息调用栈。 pattern:多行日志开始的那一行匹配的pattern negate:是否需要对pattern条件转置使用,不翻转设为true,反转设置为false match:匹配pattern后,与前面(before)还是后面(after)的内容合并为一条日志 max_lines:合并的最多行数(包含匹配pattern的那一行) timeout:到了timeout之后,即使没有匹配一个新的pattern(发生一个新的事件),也把已经匹配的日志事件发送出去 | multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}' multiline.negate: true multiline.match: after |
| tail_files | 如果设置为true,Filebeat从文件尾开始监控文件新增内容,把新增的每一行文件作为一个事件依次发送,而不是从文件开始处重新发送所有内容。 | true |
| backoff | Filebeat检测到某个文件到了EOF之后,每次等待多久再去检测文件是否有更新,默认为1s。 | -- |
| max_backoff | Filebeat检测到某个文件到了EOF之后,等待检测文件更新的最大时间,默认是10秒。 | -- |
| backoff_factor | 定义到达max_backoff的速度,默认因子是2,到达max_backoff后,变成每次等待max_backoff那么长的时间才backoff一次,直到文件有更新才会重置为backoff。 | -- |
| spool_size | spooler的大小,spooler中的事件数量超过这个阈值的时候会清空发送出去(不论是否到达超时时间)。 | -- |
| idle_timeout | spooler的超时时间,如果到了超时时间,spooler也会清空发送出去(不论是否到达容量的阈值)。 | -- |
| config_dir | 如果要在本配置文件中引入其他位置的配置文件,可以写在这里(需要写完整路径),但是只处理prospector的部分。 | -- |
| publish_async | 是否采用异步发送模式(实验功能)。 | -- |
| processors:传递数据的预处理www.elastic.co/guide/en/be… | ||
| drop_files | 清除不必要传递的字段 | fields: ["@metadata","agent","ecs","host","input"] |
| output:输出至kafka的配置www.elastic.co/guide/en/be… | ||
多个采集工具比对
日志平台的配置
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: log
enabled: true
paths:
- /persist/logs/*.log
- /persist/logs/gclogs/*
#include_lines: ['^ERR', '^WARN']
exclude_files: [".gz$","[0-9]{2}.log$"]
close_older: 1h
force_close_files: false
scan_frequency: 10s
harvester_buffer_size: 65536
max_bytes: 104857600
tail_files: true
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
# tags: ["logging_app"]
# - type: filestream
# enabled: false
# paths:
# - /var/log/*.log
#- c:\programdata\elasticsearch\logs\*
#exclude_lines: ['^DBG']
#include_lines: ['^ERR', '^WARN']
#prospector.scanner.exclude_files: ['.gz$']
#fields:
# level: debug
# review: 1
# ============================== Filebeat modules ==============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
# ======================= Elasticsearch template setting =======================
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
# ================================== General ===================================
# Registry data path. If a relative path is used, it is considered relative to the
# data path.
filebeat.registry.path: /data/docker-persist/${CONTAINER_IP_ADDR}/registry
# The permissions mask to apply on registry data, and meta files. The default
# value is 0600. Must be a valid Unix-style file permissions mask expressed in
# octal notation. This option is not supported on Windows.
#filebeat.registry.file_permissions: 0600
# The timeout value that controls when registry entries are written to disk
# (flushed). When an unwritten update exceeds this value, it triggers a write
# to disk. When flush is set to 0s, the registry is written to disk after each
# batch of events has been published successfully. The default value is 0s.
#filebeat.registry.flush: 0s
# Starting with Filebeat 7.0, the registry uses a new directory format to store
# Filebeat state. After you upgrade, Filebeat will automatically migrate a 6.x
# registry file to use the new directory format. If you changed
# filebeat.registry.path while upgrading, set filebeat.registry.migrate_file to
# point to the old registry file.
#filebeat.registry.migrate_file: ${path.data}/registry
# By default Ingest pipelines are not updated if a pipeline with the same ID
# already exists. If this option is enabled Filebeat overwrites pipelines
# everytime a new Elasticsearch connection is established.
#filebeat.overwrite_pipelines: false
# How long filebeat waits on shutdown for the publisher to finish.
# Default is 0, not waiting.
#filebeat.shutdown_timeout: 0
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
fields:
environment: ${CONTAINER_ENV}
project: ${CONTAINER_PROJ}
host: ${CONTAINER_IP_ADDR}
topic: logging_${CONTAINER_PROJ}_${CONTAINER_ENV}
# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
# =================================== Kibana ===================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
# =============================== Elastic Cloud ================================
# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
# ================================== Outputs ===================================
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Kafka Output ----------------------------
output.kafka:
# initial brokers for reading cluster metadata
hosts: ["你的kafka地址"]
version: 1.0.1
# username:
# password:
# message topic selection + partitioning
topic: '%{[fields.topic]}'
# topics:
# - topic: "critical-%{[agent.version]}"
# when.contains:
# message: "CRITICAL"
# - topic: "error-%{[agent.version]}"
# when.contains:
# message: "ERR"
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000
# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
# Array of hosts to connect to.
#hosts: ["localhost:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
#username: "elastic"
#password: "changeme"
# ------------------------------ Logstash Output -------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
# ================================= Processors =================================
processors:
# - timestamp:
# field: "@timestamp"
# layouts:
# - '2006-01-02 15:04:05.999'
# timezone: Asia/Shanghai
- drop_fields:
fields: ["agent","ecs","host","input"]
# - add_host_metadata:
# when.not.contains.tags: forwarded
# - add_cloud_metadata: ~
# - add_docker_metadata: ~
# - add_kubernetes_metadata: ~
# ================================== Logging ===================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]
# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
# ============================== Instrumentation ===============================
# Instrumentation support for the filebeat.
#instrumentation:
# Set to true to enable instrumentation of filebeat.
#enabled: false
# Environment in which filebeat is running on (eg: staging, production, etc.)
#environment: ""
# APM Server hosts to report instrumentation results to.
#hosts:
# - http://localhost:8200
# API Key for the APM Server(s).
# If api_key is set then secret_token will be ignored.
#api_key:
# Secret token for the APM Server(s).
#secret_token:
# ================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true