2-日志平台-日志采集

245 阅读7分钟

日志采集:

流程要做的事情
日志规范 固定字段定义 日志格式
日志采集落盘规则 滚动策略 采集方法
日志传输 消息队列 消费方式 Topic规范 保存时间
日志切分 采样 过滤 自定格式
日志检索 索引分割 分片设置 检索优化 权限设置 保存时间
日志流监控 采集异常 传输异常 检索异常 不合规范 监控报警

日志目标:

logback的filter

这部分与使用者是强相关的,作用就是在日志落盘前起一个过滤的作用

咋说呢,就是接入到日志平台之后,你发现有些条件日志不想落盘,那么可以在filter里使用规则过滤掉

logback.qos.ch/manual/filt…

不配,使用默认值

logback的rollingPolicy

目前先统一一个按照时间滚动的,大部分的业务对于日志的使用都基于时间这个主要维度,基于量的这个维度很少

logback.qos.ch/manual/appe…

<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
   <fileNamePattern>${log.home}/archives/你的日志名.%d{yyyy-MM-dd}.log.gz</fileNamePattern>
   <maxHistory>10</maxHistory>
</rollingPolicy>

logback的logger

当前的使用中logback的配置会尽可能的贴近业务自身,所以会在应用中配置多个logger指定输出到特定的appender日志文件下

日志平台会将所有符合规范的日志采集至统一查询内,因此多个logger的additivity尽量配置为false

例如:

<logger name="io.netty.util.ResourceLeakDetector" level="INFO" additivity="false">
   <appender-ref ref="NETTY_MEM_LEAK" />
</logger>

filebeat的配置参数

版本:artifacts.elastic.co/downloads/b…

配置作用日志平台配置
General:通用配置www.elastic.co/guide/en/be…
fields向输出的每一条日志添加额外的信息,比如“level:debug”,方便后续对日志进行分组统计。默认情况下,会在输出信息的fields子目录下以指定的新增fields建立子目录. 
filebeat.registry.path记录filebeat处理日志文件的位置的文件/data/docker-persist/${CONTAINER_IP_ADDR}/registry
   
   
inputs:日志采集配置www.elastic.co/guide/en/be…
paths指定要监控的日志,目前按照Go语言的glob函数处理。没有对配置目录做递归处理。/data/docker-persist//logs/.log
encoding指定被监控的文件的编码类型,使用plain和utf-8都是可以处理中文日志的。--
input_type指定文件的输入类型log(默认)或者stdin。log
exclude_lines在输入中排除符合正则表达式列表的那些行。--
include_lines包含输入中符合正则表达式列表的那些行(默认包含所有行),include_lines执行完毕之后会执行exclude_lines。--
exclude_files忽略掉符合正则表达式列表的文件(默认为每一个符合paths定义的文件都创建一个harvester)。[``".gz$"``,``"[0-9]{2}.log$"``]
fields_under_root如果该选项设置为true,则新增fields成为顶级目录,而不是将其放在fields目录下。自定义的field会覆盖filebeat默认的field。--
ignore_older可以指定Filebeat忽略指定时间段以外修改的日志内容,比如2h(两个小时)或者5m(5分钟)。--
close_older如果一个文件在某个时间段内没有发生过更新,则关闭监控的文件handle。默认1h。1h
force_close_filesFilebeat会在没有到达close_older之前一直保持文件的handle,如果在这个时间窗内删除文件会有问题,所以可以把force_close_files设置为true,只要filebeat检测到文件名字发生变化,就会关掉这个handle。false
scan_frequencyFilebeat以多快的频率去prospector指定的目录下面检测文件更新(比如是否有新增文件),如果设置为0s,则Filebeat会尽可能快地感知更新(占用的CPU会变高)。默认是10s。10s
document_type设定Elasticsearch输出时的document的type字段,也可以用来给日志进行分类。 
harvester_buffer_size每个harvester监控文件时,使用的buffer的大小。65536
max_bytes日志文件中增加一行算一个日志事件,max_bytes限制在一次日志事件中最多上传的字节数,多出的字节会被丢弃。104857600
multiline适用于日志中每一条日志占据多行的情况,比如各种语言的报错信息调用栈。 pattern:多行日志开始的那一行匹配的pattern negate:是否需要对pattern条件转置使用,不翻转设为true,反转设置为false match:匹配pattern后,与前面(before)还是后面(after)的内容合并为一条日志 max_lines:合并的最多行数(包含匹配pattern的那一行) timeout:到了timeout之后,即使没有匹配一个新的pattern(发生一个新的事件),也把已经匹配的日志事件发送出去multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}' multiline.negate: true multiline.match: after
tail_files如果设置为true,Filebeat从文件尾开始监控文件新增内容,把新增的每一行文件作为一个事件依次发送,而不是从文件开始处重新发送所有内容。true
backoffFilebeat检测到某个文件到了EOF之后,每次等待多久再去检测文件是否有更新,默认为1s。--
max_backoffFilebeat检测到某个文件到了EOF之后,等待检测文件更新的最大时间,默认是10秒。--
backoff_factor定义到达max_backoff的速度,默认因子是2,到达max_backoff后,变成每次等待max_backoff那么长的时间才backoff一次,直到文件有更新才会重置为backoff。--
spool_sizespooler的大小,spooler中的事件数量超过这个阈值的时候会清空发送出去(不论是否到达超时时间)。--
idle_timeoutspooler的超时时间,如果到了超时时间,spooler也会清空发送出去(不论是否到达容量的阈值)。--
config_dir如果要在本配置文件中引入其他位置的配置文件,可以写在这里(需要写完整路径),但是只处理prospector的部分。--
publish_async是否采用异步发送模式(实验功能)。--
processors:传递数据的预处理www.elastic.co/guide/en/be…
drop_files清除不必要传递的字段fields: ["@metadata","agent","ecs","host","input"]
output:输出至kafka的配置www.elastic.co/guide/en/be…
   
   
   

多个采集工具比对

日志平台的配置

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /persist/logs/*.log
    - /persist/logs/gclogs/*
  #include_lines: ['^ERR', '^WARN']
  exclude_files: [".gz$","[0-9]{2}.log$"]
  close_older: 1h
  force_close_files: false
  scan_frequency: 10s
  harvester_buffer_size: 65536
  max_bytes: 104857600
  tail_files: true
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after
  # tags: ["logging_app"]
# - type: filestream
#   enabled: false
#   paths:
#     - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*
  #exclude_lines: ['^DBG']
  #include_lines: ['^ERR', '^WARN']
  #prospector.scanner.exclude_files: ['.gz$']

  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================
# Registry data path. If a relative path is used, it is considered relative to the
# data path.
filebeat.registry.path: /data/docker-persist/${CONTAINER_IP_ADDR}/registry

# The permissions mask to apply on registry data, and meta files. The default
# value is 0600.  Must be a valid Unix-style file permissions mask expressed in
# octal notation.  This option is not supported on Windows.
#filebeat.registry.file_permissions: 0600

# The timeout value that controls when registry entries are written to disk
# (flushed). When an unwritten update exceeds this value, it triggers a write
# to disk. When flush is set to 0s, the registry is written to disk after each
# batch of events has been published successfully. The default value is 0s.
#filebeat.registry.flush: 0s


# Starting with Filebeat 7.0, the registry uses a new directory format to store
# Filebeat state. After you upgrade, Filebeat will automatically migrate a 6.x
# registry file to use the new directory format. If you changed
# filebeat.registry.path while upgrading, set filebeat.registry.migrate_file to
# point to the old registry file.
#filebeat.registry.migrate_file: ${path.data}/registry

# By default Ingest pipelines are not updated if a pipeline with the same ID
# already exists. If this option is enabled Filebeat overwrites pipelines
# everytime a new Elasticsearch connection is established.
#filebeat.overwrite_pipelines: false

# How long filebeat waits on shutdown for the publisher to finish.
# Default is 0, not waiting.
#filebeat.shutdown_timeout: 0

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
fields:
  environment: ${CONTAINER_ENV}
  project: ${CONTAINER_PROJ}
  host: ${CONTAINER_IP_ADDR}
  topic: logging_${CONTAINER_PROJ}_${CONTAINER_ENV}

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Kafka Output ----------------------------
output.kafka:
  # initial brokers for reading cluster metadata
  hosts: ["你的kafka地址"]
  version: 1.0.1
  # username: 
  # password: 

  # message topic selection + partitioning
  topic: '%{[fields.topic]}'
  # topics:
  #   - topic: "critical-%{[agent.version]}"
  #     when.contains:
  #       message: "CRITICAL"
  #   - topic: "error-%{[agent.version]}"
  #     when.contains:
  #       message: "ERR"
  partition.round_robin:
    reachable_only: false

  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  # - timestamp: 
  #     field: "@timestamp"
  #     layouts:  
  #       - '2006-01-02 15:04:05.999'
  #     timezone: Asia/Shanghai
  - drop_fields:
      fields: ["agent","ecs","host","input"]
  # - add_host_metadata:
  #     when.not.contains.tags: forwarded
  # - add_cloud_metadata: ~
  # - add_docker_metadata: ~
  # - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true