2-日志平台-日志采集http://logback.qos.ch/manual/filters.http://logba

日志采集：

流程	要做的事情
日志规范	固定字段定义日志格式
日志采集	落盘规则滚动策略采集方法
日志传输	消息队列消费方式 Topic规范保存时间
日志切分	采样过滤自定格式
日志检索	索引分割分片设置检索优化权限设置保存时间
日志流监控	采集异常传输异常检索异常不合规范监控报警

日志目标：

logback的filter

这部分与使用者是强相关的，作用就是在日志落盘前起一个过滤的作用

咋说呢，就是接入到日志平台之后，你发现有些条件日志不想落盘，那么可以在filter里使用规则过滤掉

logback.qos.ch/manual/filt…

不配，使用默认值

logback的rollingPolicy

目前先统一一个按照时间滚动的，大部分的业务对于日志的使用都基于时间这个主要维度，基于量的这个维度很少

logback.qos.ch/manual/appe…

<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
   <fileNamePattern>${log.home}/archives/你的日志名.%d{yyyy-MM-dd}.log.gz</fileNamePattern>
   <maxHistory>10</maxHistory>
</rollingPolicy>

logback的logger

当前的使用中logback的配置会尽可能的贴近业务自身，所以会在应用中配置多个logger指定输出到特定的appender日志文件下

日志平台会将所有符合规范的日志采集至统一查询内，因此多个logger的additivity尽量配置为false

例如：

<logger name="io.netty.util.ResourceLeakDetector" level="INFO" additivity="false">
   <appender-ref ref="NETTY_MEM_LEAK" />
</logger>

filebeat的配置参数

版本：artifacts.elastic.co/downloads/b…

配置	作用	日志平台配置
General：通用配置www.elastic.co/guide/en/be…
fields	向输出的每一条日志添加额外的信息，比如“level:debug”，方便后续对日志进行分组统计。默认情况下，会在输出信息的fields子目录下以指定的新增fields建立子目录.
filebeat.registry.path	记录filebeat处理日志文件的位置的文件	/data/docker-persist/${CONTAINER_IP_ADDR}/registry


inputs：日志采集配置www.elastic.co/guide/en/be…
paths	指定要监控的日志，目前按照Go语言的glob函数处理。没有对配置目录做递归处理。	/data/docker-persist//logs/.log
encoding	指定被监控的文件的编码类型，使用plain和utf-8都是可以处理中文日志的。	--
input_type	指定文件的输入类型log(默认)或者stdin。	log
exclude_lines	在输入中排除符合正则表达式列表的那些行。	--
include_lines	包含输入中符合正则表达式列表的那些行（默认包含所有行），include_lines执行完毕之后会执行exclude_lines。	--
exclude_files	忽略掉符合正则表达式列表的文件（默认为每一个符合paths定义的文件都创建一个harvester）。	[``".gz$"``,``"[0-9]{2}.log$"``]
fields_under_root	如果该选项设置为true，则新增fields成为顶级目录，而不是将其放在fields目录下。自定义的field会覆盖filebeat默认的field。	--
ignore_older	可以指定Filebeat忽略指定时间段以外修改的日志内容，比如2h（两个小时）或者5m(5分钟)。	--
close_older	如果一个文件在某个时间段内没有发生过更新，则关闭监控的文件handle。默认1h。	1h
force_close_files	Filebeat会在没有到达close_older之前一直保持文件的handle，如果在这个时间窗内删除文件会有问题，所以可以把force_close_files设置为true，只要filebeat检测到文件名字发生变化，就会关掉这个handle。	false
scan_frequency	Filebeat以多快的频率去prospector指定的目录下面检测文件更新（比如是否有新增文件），如果设置为0s，则Filebeat会尽可能快地感知更新（占用的CPU会变高）。默认是10s。	10s
document_type	设定Elasticsearch输出时的document的type字段，也可以用来给日志进行分类。
harvester_buffer_size	每个harvester监控文件时，使用的buffer的大小。	65536
max_bytes	日志文件中增加一行算一个日志事件，max_bytes限制在一次日志事件中最多上传的字节数，多出的字节会被丢弃。	104857600
multiline	适用于日志中每一条日志占据多行的情况，比如各种语言的报错信息调用栈。 pattern：多行日志开始的那一行匹配的pattern negate：是否需要对pattern条件转置使用，不翻转设为true，反转设置为false match：匹配pattern后，与前面（before）还是后面（after）的内容合并为一条日志 max_lines：合并的最多行数（包含匹配pattern的那一行） timeout：到了timeout之后，即使没有匹配一个新的pattern（发生一个新的事件），也把已经匹配的日志事件发送出去	multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}' multiline.negate: true multiline.match: after
tail_files	如果设置为true，Filebeat从文件尾开始监控文件新增内容，把新增的每一行文件作为一个事件依次发送，而不是从文件开始处重新发送所有内容。	true
backoff	Filebeat检测到某个文件到了EOF之后，每次等待多久再去检测文件是否有更新，默认为1s。	--
max_backoff	Filebeat检测到某个文件到了EOF之后，等待检测文件更新的最大时间，默认是10秒。	--
backoff_factor	定义到达max_backoff的速度，默认因子是2，到达max_backoff后，变成每次等待max_backoff那么长的时间才backoff一次，直到文件有更新才会重置为backoff。	--
spool_size	spooler的大小，spooler中的事件数量超过这个阈值的时候会清空发送出去（不论是否到达超时时间）。	--
idle_timeout	spooler的超时时间，如果到了超时时间，spooler也会清空发送出去（不论是否到达容量的阈值）。	--
config_dir	如果要在本配置文件中引入其他位置的配置文件，可以写在这里（需要写完整路径），但是只处理prospector的部分。	--
publish_async	是否采用异步发送模式（实验功能）。	--
processors：传递数据的预处理www.elastic.co/guide/en/be…
drop_files	清除不必要传递的字段	fields: ["@metadata","agent","ecs","host","input"]
output：输出至kafka的配置www.elastic.co/guide/en/be…

多个采集工具比对

日志平台的配置

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /persist/logs/*.log
    - /persist/logs/gclogs/*
  #include_lines: ['^ERR', '^WARN']
  exclude_files: [".gz$","[0-9]{2}.log$"]
  close_older: 1h
  force_close_files: false
  scan_frequency: 10s
  harvester_buffer_size: 65536
  max_bytes: 104857600
  tail_files: true
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after
  # tags: ["logging_app"]
# - type: filestream
#   enabled: false
#   paths:
#     - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*
  #exclude_lines: ['^DBG']
  #include_lines: ['^ERR', '^WARN']
  #prospector.scanner.exclude_files: ['.gz$']

  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================
# Registry data path. If a relative path is used, it is considered relative to the
# data path.
filebeat.registry.path: /data/docker-persist/${CONTAINER_IP_ADDR}/registry

# The permissions mask to apply on registry data, and meta files. The default
# value is 0600.  Must be a valid Unix-style file permissions mask expressed in
# octal notation.  This option is not supported on Windows.
#filebeat.registry.file_permissions: 0600

# The timeout value that controls when registry entries are written to disk
# (flushed). When an unwritten update exceeds this value, it triggers a write
# to disk. When flush is set to 0s, the registry is written to disk after each
# batch of events has been published successfully. The default value is 0s.
#filebeat.registry.flush: 0s


# Starting with Filebeat 7.0, the registry uses a new directory format to store
# Filebeat state. After you upgrade, Filebeat will automatically migrate a 6.x
# registry file to use the new directory format. If you changed
# filebeat.registry.path while upgrading, set filebeat.registry.migrate_file to
# point to the old registry file.
#filebeat.registry.migrate_file: ${path.data}/registry

# By default Ingest pipelines are not updated if a pipeline with the same ID
# already exists. If this option is enabled Filebeat overwrites pipelines
# everytime a new Elasticsearch connection is established.
#filebeat.overwrite_pipelines: false

# How long filebeat waits on shutdown for the publisher to finish.
# Default is 0, not waiting.
#filebeat.shutdown_timeout: 0

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
fields:
  environment: ${CONTAINER_ENV}
  project: ${CONTAINER_PROJ}
  host: ${CONTAINER_IP_ADDR}
  topic: logging_${CONTAINER_PROJ}_${CONTAINER_ENV}

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Kafka Output ----------------------------
output.kafka:
  # initial brokers for reading cluster metadata
  hosts: ["你的kafka地址"]
  version: 1.0.1
  # username: 
  # password: 

  # message topic selection + partitioning
  topic: '%{[fields.topic]}'
  # topics:
  #   - topic: "critical-%{[agent.version]}"
  #     when.contains:
  #       message: "CRITICAL"
  #   - topic: "error-%{[agent.version]}"
  #     when.contains:
  #       message: "ERR"
  partition.round_robin:
    reachable_only: false

  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  # - timestamp: 
  #     field: "@timestamp"
  #     layouts:  
  #       - '2006-01-02 15:04:05.999'
  #     timezone: Asia/Shanghai
  - drop_fields:
      fields: ["agent","ecs","host","input"]
  # - add_host_metadata:
  #     when.not.contains.tags: forwarded
  # - add_cloud_metadata: ~
  # - add_docker_metadata: ~
  # - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true