Elasticsearch搜索三、logstash过时了吗?

238 阅读3分钟

[TOC]

近年来filebeat强势崛起,以其轻量级引起众多开发者的青睐。笔者认为filebeat远没有达到能够完全取代logstash的程度,filebeat很多功能都是以logstash作为互补品出现的。

官网 Logstash Reference [7.16] | Elastic

image.png

一、使用supervisord保证logstash长期运行

1. 使用 supervisord 管理logstash,使用此方式启动logstash,便于维护

  1. 停止
  2. 查看状态
  3. 启动
service supervisord stop
service supervisord status
service supervisord start

3. 修改/etc/supervisord.conf 文件新增配置项

  1. directory: 工作目录
  2. user : 以某用户执行命令
  3. commnad : 要执行的命令
[program:info]
environment=LS_HEAP_SIZE=5000m
directory=/home/elastic/logstash-7.16.2
user=elastic
command=/home/elastic/logstash-7.16.2/bin/logstash -f /home/elastic/logstash-7.16.2/config/fullinfo.yml --api.http.port 9600 -w 10 -l /home/elastic/logstash-7.16.2/logs/fullinfo

[program:lexicon]
environment=LS_HEAP_SIZE=5000m
directory=/home/elastic/logstash-7.16.2
user=elastic
command=/home/elastic/logstash-7.16.2/bin/logstash -f /home/elastic/logstash-7.16.2/config/smallinfo.yml --api.http.port 9601 -w 10 -l /home/elastic/logstash-7.16.2/logs/smallinfo

3.通过supervisordctl启动或者停止进程

  1. 启动 start
  2. 停止 stop
  3. 更新配置文件( 只改动了某个配置文件,只想重载这个配置文件然后重启) update
  4. 重新读取配置文件 reread
  5. 重启 reload
  6. 重启所有 supervisorctl restart all
supervisorctl start info
supervisorctl stop info
supervisorctl update
supervisorctl reread
supervisorctl reload

二、logstash常见问题总结

1. input消费Kafka中消息时,group_id 的名字的设置

使用机器ip,通过kafka consumers 可以很快找到哪台机器的Logstash做的消费 如果是两台机器使用同一个消费者组,将两台机器的ip通过字符"_"拼接做为消费组名

2. 修改数据

  1. 移除某个字段
  2. 丢弃不满足条件的数据
  3. 显示 @metadata数据 stdout { codec => rubydebug { metadata => true } }
filter{
     mutate {
        copy => [ "sourceTime" , "indexTag" ] # 新增 indexTag field,值使用sourceTime的值得 
        remove_field =>    ["sourceName"] # 移除sourceName字段
        rename => ["title","dictionaries"] #将源数据中的title重命名为dictionaries
     }
     truncate {  # 截取 indexTag 保留7位长度
              fields => [ "indexTag" ]
              length_bytes => 7
     }
     if [indexTag] <= "2021-11" { # 将 indexTag <= "2021-11" 数据丢弃
        drop {
         
        }
     }
}

3. 同步到两个Elasticsearch

通过添加元数据字段来控制数据最终到哪一个索引

input {
    kafka {
      bootstrap_servers => ["172.26.17.97:9092,172.26.17.98:9092,172.26.17.99:9092"]
      client_id => "info"
      group_id => "logstash_ip91_to_es_ip91_info"
      auto_offset_reset => "earliest"
      consumer_threads => 6
      decorate_events => "true"
      topics => ["info.news.before2022-01-01","info.news.increment"]
      codec => "json"
    }
  }
filter{
     mutate {
         copy => { "sourceTime" => "year_tag" }
     }
     mutate {
         copy => { "sourceTime" => "month_tag" }
     }
     truncate {
              fields => [ "month_tag" ]
              length_bytes => 7
     }
     truncate {
              fields => [ "year_tag" ]
              length_bytes => 4
     }
     mutate {
        add_field => { "[@metadata][fullinfo_tag]" => "fullinfo-%{year_tag}" }
        add_field => { "[@metadata][smallinfo_tag]" => "smallinfo-%{month_tag}" }
        remove_field => ["month_tag"]
        remove_field => ["year_tag"]
     }

}
output {
  elasticsearch {
    hosts => ["http://172.26.17.91:9200"]
    index => "%{[@metadata][fullinfo_tag]}"
    template => "/home/elastic/logstash-7.16.2/config/info-template.json"
    template_name => "info-template"
    template_overwrite => false
    timeout => 300
    user => "elastic"
    password => "a#123456"
    document_id => "%{id}"
  }
  if [@metadata][smallinfo_tag] >= "smallinfo-2021-12" {
    elasticsearch {
      hosts => ["http://172.26.17.91:9200"]
      index => "%{[@metadata][smallinfo_tag]}"
      template => "/home/elastic/logstash-7.16.2/config/info-template.json"
      template_name => "info-template"
      template_overwrite => false
      timeout => 300
      user => "elastic"
      password => "a#123456"
      document_id => "%{id}"
    }
  }
}

三、使用pipelines.yml

./bin/logstash使用的pipelines.yml创建流水线,而无法通过-f 参数指定pipelines.yml文件。笔者曾被这个问题坑了 Multiple Pipelinesedit If you need to run more than one pipeline in the same process, Logstash provides a way to do this through a configuration file called pipelines.yml. This file must be placed in the path.settings folder and follows this structure:

- pipeline.id: my-pipeline_1
  path.config: "/etc/path/to/p1.config"
  pipeline.workers: 3
- pipeline.id: my-other-pipeline
  path.config: "/etc/different/path/p2.cfg"
  queue.type: persisted

This file is formatted in YAML and contains a list of dictionaries, where each dictionary describes a pipeline, and each key/value pair specifies a setting for that pipeline. The example shows two different pipelines described by their IDs and configuration paths. For the first pipeline, the value of pipeline.workers is set to 3, while in the other, the persistent queue feature is enabled. The value of a setting that is not explicitly set in the pipelines.yml file will fall back to the default specified in the logstash.yml settings file.

When you start Logstash without arguments, it will read the pipelines.yml file and instantiate all pipelines specified in the file. On the other hand, when you use -e or -f, Logstash ignores the pipelines.yml file and logs a warning about it.

-f 只是指定配置文件,此时会 ignores the pipelines.yml,而pipelines.yml文件能配置多个流水线

注意通过 ./bin/logstah -f <file,file_dir>

-f, --path.config CONFIG_PATH Load the logstash config from a specific file
                              or directory.  If a directory is given, all
                              files in that directory will be concatenated
                              in lexicographical order and then parsed as a
                              single config file. You can also specify
                              wildcards (globs) and any matched files will
                              be loaded in the order described above.